Recently, I passed a job interview where we discussed a practical use case related to intelligent document processing and automation. In this blog series, I’ll attempt to implement key parts of that use case to evaluate how well I can design such a system and integrate the relevant tools and software.
We are given the following scenario:
Part 1 – Ingest Data (Zoho API)
Part 2 – Parse Data (LlamaIndex)
Part 3 – Build AI Agent (LangChain + RAG)
To begin, I set up a custom domain to test the Zoho integration. Authentication with the Zoho API is handled via OAuth 2.0.
I followed the guide at Zoho OAuth Request Docs and first created a Server-based Application, which provided me with a Client ID and Client Secret.
To request the authorization code:
https://accounts.zoho.eu/oauth/v2/auth?
scope=ZohoMail.messages.READ,ZohoMail.accounts.READ&
client_id=1000.xxxx&
response_type=code&
access_type=offline&
redirect_uri=https://yourdomain/oauth/callback
This redirects to:
https://yourdomain/oauth/callback?
code=1000.yyyy.zzzz&
location=eu&
accounts-server=https%3A%2F%2Faccounts.zoho.eu
Run this:
curl --request POST \
--url "https://accounts.zoho.eu/oauth/v2/token" \
--data "grant_type=authorization_code" \
--data "client_id=1000.xxxx" \
--data "client_secret=your_client_secret" \
--data "redirect_uri=https://yourdomain/oauth/callback" \
--data "code=1000.yyyy.zzzz"
Note: Don’t forget to replace your_client_secret with your actual client secret.
Sample Response:
{
"access_token": "1000.ttt",
"refresh_token": "1000.uuu",
"scope": "ZohoMail.messages.READ ZohoMail.accounts.READ",
"api_domain": "https://www.zohoapis.eu",
"token_type": "Bearer",
"expires_in": 3600
}
You need your Zoho Mail account ID to access your messages/mails:
curl --request GET \
--url https://mail.zoho.eu/api/accounts \
--header "Authorization: Zoho-oauthtoken 1000.ttt"
Note: This may return an error if your token scope doesn't include ZohoMail.accounts.READ.
Once you have the account ID:
curl --request GET \
--url "https://mail.zoho.eu/api/accounts/youraccountid/messages/view?limit=10" \
--header "Authorization: Zoho-oauthtoken 1000.ttt"
Note: Don’t forget to replace youraccountid with your actual account id.
Example Response:
{
"status": {
"code": 200,
"description": "success"
},
"data": [
{
"summary": "Hey MC, Welcome 🙏 Mahmut ÇAVDAR...",
"subject": "Re: Test Email",
"fromAddress": "mahmutcvdr@gmail.com",
"hasAttachment": "0",
"folderId":"7192274xxxx000002014",
"messageId":"17525962xx222005600",
....
}
]
}
If there's no attachment, "hasAttachment": "0". If there is an attachment, you can fetch its metadata:
Once you have the account ID:
curl "https://mail.zoho.eu/api/accounts/youraccountid/folders/7192274xxxx000002014/messages/17525962xx222005600/attachmentinfo?includeInline=true" \
-X GET \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization:Zoho-oauthtoken 1000.ttt"
Example Response:
{
"status": { "code": 200, "description": "success" },
"data": {
"attachments": [
{
"attachmentName": "CV_Cavdar.pdf",
"attachmentId": "139719xxx010000"
}
]
}
}
Once you have the account ID:
curl "https://mail.zoho.eu/api/accounts/youraccountid/folders/7192274xxxx000002014/messages/17525962xx222005600/attachments/139719xxx010000" \
-X GET \
-H "Accept: application/octet-stream" \
-H "Content-Type: application/json" \
-H "Authorization:Zoho-oauthtoken 1000.ttt"
-o cv.pdf
And voilà — you have the attachments!
I even found an issue in the Zoho API documentation. The example on this page is incorrect — it shows the example for "Get Email Attachment Info" instead of "Get Email Attachment Content".
Wrong example:
curl "https://mail.zoho.com/api/accounts/12345678/folders/9000000002014/messages/1710915488416100001/attachmentinfo" \
-X GET \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization:Zoho-oauthtoken *****"
Correct Example:
curl "https://mail.zoho.com/api/accounts/12345678/folders/9000000002014/messages/1710915488416100001/attachments/139712110853010000" \
-X GET \
-H "Accept: application/octet-stream" \
-H "Content-Type: application/json" \
-H "Authorization:Zoho-oauthtoken *****"
The bug has been reported here.
In Part 2, I’ll use LlamaIndex to parse PDF files and structure their contents for downstream processing. I’ll likely use this sample invoice dataset as a starting point.
In Part 3, I’ll bring in LangChain to build a agentic system that can run user queries over the ingested content and support basic BI automation tasks.