Recently, I passed a job interview where we discussed a practical use case related to intelligent document processing and automation. In this blog series, I’ll attempt to implement key parts of that use case to evaluate how well I can design such a system and integrate the relevant tools and software.

The Use Case

We are given the following scenario:

  • We use Zoho CRM/Email to ingest data — mostly in PDF format.
  • We need to parse the contents of these documents.
  • Finally, we aim to build an agentic AI system using RAG (Retrieval-Augmented Generation) to allow users to query the documents and automate Business Intelligence (BI) tasks.

Roadmap

Part 1 – Ingest Data (Zoho API)

Part 2 – Parse Data (LlamaIndex)

Part 3 – Build AI Agent (LangChain + RAG)


Part 1 – Ingest Data via Zoho API

To begin, I set up a custom domain to test the Zoho integration. Authentication with the Zoho API is handled via OAuth 2.0.

Step 1: Create OAuth Credentials

I followed the guide at Zoho OAuth Request Docs and first created a Server-based Application, which provided me with a Client ID and Client Secret.

To request the authorization code:

https://accounts.zoho.eu/oauth/v2/auth?
scope=ZohoMail.messages.READ,ZohoMail.accounts.READ&
client_id=1000.xxxx&
response_type=code&
access_type=offline&
redirect_uri=https://yourdomain/oauth/callback

This redirects to:

https://yourdomain/oauth/callback?
code=1000.yyyy.zzzz&
location=eu&
accounts-server=https%3A%2F%2Faccounts.zoho.eu

Step 2: Get Access Token

Run this:

curl --request POST \
  --url "https://accounts.zoho.eu/oauth/v2/token" \
  --data "grant_type=authorization_code" \
  --data "client_id=1000.xxxx" \
  --data "client_secret=your_client_secret" \
  --data "redirect_uri=https://yourdomain/oauth/callback" \
  --data "code=1000.yyyy.zzzz"

Note: Don’t forget to replace your_client_secret with your actual client secret.

Sample Response:

{
  "access_token": "1000.ttt",
  "refresh_token": "1000.uuu",
  "scope": "ZohoMail.messages.READ ZohoMail.accounts.READ",
  "api_domain": "https://www.zohoapis.eu",
  "token_type": "Bearer",
  "expires_in": 3600
}

Step 3: Get Account ID

You need your Zoho Mail account ID to access your messages/mails:

curl --request GET \
  --url https://mail.zoho.eu/api/accounts \
  --header "Authorization: Zoho-oauthtoken 1000.ttt"

Note: This may return an error if your token scope doesn't include ZohoMail.accounts.READ.

Step 4: Fetch Emails

Once you have the account ID:

curl --request GET \
  --url "https://mail.zoho.eu/api/accounts/youraccountid/messages/view?limit=10" \
  --header "Authorization: Zoho-oauthtoken 1000.ttt"

Note: Don’t forget to replace youraccountid with your actual account id.

Example Response:

{
  "status": {
    "code": 200,
    "description": "success"
  },
  "data": [
    {
      "summary": "Hey MC, Welcome 🙏 Mahmut ÇAVDAR...",
      "subject": "Re: Test Email",
      "fromAddress": "mahmutcvdr@gmail.com",
      "hasAttachment": "0",
      "folderId":"7192274xxxx000002014",
      "messageId":"17525962xx222005600",
      ....
    }
  ]
}

If there's no attachment, "hasAttachment": "0". If there is an attachment, you can fetch its metadata:

Step 5: Get Attachment Info

Once you have the account ID:

curl "https://mail.zoho.eu/api/accounts/youraccountid/folders/7192274xxxx000002014/messages/17525962xx222005600/attachmentinfo?includeInline=true" \
-X GET \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization:Zoho-oauthtoken 1000.ttt"

Example Response:

{
  "status": { "code": 200, "description": "success" },
  "data": {
    "attachments": [
      {
        "attachmentName": "CV_Cavdar.pdf",
        "attachmentId": "139719xxx010000"
      }
    ]
  }
}

Step 6: Download Attachment

Once you have the account ID:

curl "https://mail.zoho.eu/api/accounts/youraccountid/folders/7192274xxxx000002014/messages/17525962xx222005600/attachments/139719xxx010000" \
-X GET \
-H "Accept: application/octet-stream" \
-H "Content-Type: application/json" \
-H "Authorization:Zoho-oauthtoken 1000.ttt" 
-o cv.pdf

And voilà — you have the attachments!

I even found an issue in the Zoho API documentation. The example on this page is incorrect — it shows the example for "Get Email Attachment Info" instead of "Get Email Attachment Content".

Wrong example:

curl "https://mail.zoho.com/api/accounts/12345678/folders/9000000002014/messages/1710915488416100001/attachmentinfo" \
-X GET \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization:Zoho-oauthtoken *****"

Correct Example:

curl "https://mail.zoho.com/api/accounts/12345678/folders/9000000002014/messages/1710915488416100001/attachments/139712110853010000" \
-X GET \
-H "Accept: application/octet-stream" \
-H "Content-Type: application/json" \
-H "Authorization:Zoho-oauthtoken *****"

The bug has been reported here.


Next Steps

In Part 2, I’ll use LlamaIndex to parse PDF files and structure their contents for downstream processing. I’ll likely use this sample invoice dataset as a starting point.

In Part 3, I’ll bring in LangChain to build a agentic system that can run user queries over the ingested content and support basic BI automation tasks.

Previous Post