Chat with your PDF documents using PageIndex vectorless RAG via Telegram

Created by

Last update

Last update 8 hours ago

👤 Who Is This For?

This template is built for developers, researchers, and automation builders who want to create a document Q&A system — without the complexity of vector databases, embeddings, or chunking pipelines.

It's perfect for:

Developers exploring next-generation RAG architectures
Teams building internal knowledge bots over PDFs (reports, manuals, contracts)
Anyone who wants to query documents through Telegram with a clean, no-infrastructure setup

❓ What Problem Does This Solve?

Traditional RAG systems require converting text into vectors, storing them in a vector database, and relying on semantic similarity to retrieve relevant chunks. This approach has known weaknesses:

Similarity ≠ Relevance - queries express intent, not exact content
Chunking breaks context - arbitrary splits destroy meaning across sections
In-document references are missed - e.g. "see Appendix B" has no semantic match

PageIndex solves this differently. Instead of vectors, it builds a hierarchical tree index (like a Table of Contents) from your PDF using an LLM. At query time, the LLM reasons over that tree — identifies the most relevant sections, retrieves only those, and generates a precise, cited answer.

No embeddings. No vector DB. No chunking.

⚡ What This Workflow Does

This n8n template delivers a fully working Telegram-based RAG bot with two independent flows in a single workflow:

📄 Flow 1 → PDF Knowledge Upload (Run Once per Document)
Send a PDF file to your Telegram bot. The workflow downloads it and uploads it to PageIndex cloud, where the tree index is built automatically.

💬 Flow 2 → Q&A Chat (Runs Every Time)
Send any question as a text message to the same Telegram bot. The workflow fetches all your indexed documents, sends the question to PageIndex's LLM reasoning engine, and delivers a cited answer back to your Telegram chat.

🔄 How It Works

Flow 1 - PDF Upload

Receive PDF Document - Telegram Trigger listens for messages containing a file. Send any PDF to the bot to start indexing.
Download PDF File - The bot downloads the binary PDF from Telegram's file storage using the file_id.
Index PDF on PageIndex - The PDF is uploaded to PageIndex cloud via POST /doc/. PageIndex builds a hierarchical tree index (TOC with LLM-generated summaries per section). Returns a doc_id. No vectors are created.

Flow 2 - Q&A

Receive User Question - Telegram Trigger listens for text messages. Any message triggers the Q&A flow.
Fetch All Indexed Documents - Calls GET /docs on PageIndex to retrieve all previously uploaded documents.
Extract Document IDs - Maps the documents list into a clean array of doc_id strings.
LLM Reasoning over Document Tree - Sends the user's question + all doc_ids to PageIndex POST /chat/completions. PageIndex's LLM traverses the tree, identifies the relevant nodes, retrieves the raw text, and generates an answer with page citations.
Send Answer to User The answer is delivered back to the exact Telegram user who asked, using their chat_id.

🛠️ Setup Instructions

Step 1 - Create a Telegram Bot

Open Telegram and message @BotFather
Send /newbot and follow the prompts
Copy the Bot Token provided
In n8n, add a new Telegram credential and paste the token

Step 2 - Get Your PageIndex API Key

Visit dash.pageindex.ai and create a free account
Go to API Keys and generate a new key
In the workflow, replace YOUR_PAGEINDEX_API_KEY in these three nodes:
- ☁️ Index PDF on PageIndex
- 📚 Fetch All Indexed Documents
- 🧠 LLM Reasoning over Document Tree

Step 3 - Connect Telegram Credentials

Both Telegram Trigger nodes and the Telegram send node use the same credential. Set your Telegram API credentials once and n8n will apply them across all nodes automatically.

Step 4 - Activate the Workflow

Click Activate in n8n
Send a PDF file to your Telegram bot → it gets indexed
Send any text question → get an LLM-reasoned answer back

📋 Required Credentials

Service	Where to Get	Used In
Telegram Bot Token	@BotFather on Telegram	All Telegram nodes
PageIndex API Key	API Key From Dashboard	Upload + Chat nodes

💡 How to Customize

Query multiple documents at once - Upload multiple PDFs (each creates a separate doc_id). The Q&A flow automatically fetches all of them and reasons across all documents simultaneously.
Change temperature - In the LLM Reasoning over Document Tree node, adjust "temperature": 0.5 for more creative (higher) or more precise (lower) answers.
Enable/disable citations - Toggle "enable_citations": true/false in the chat node body to control whether page references appear in answers.
Filter by specific document - Modify the Extract Document IDs node to filter only documents with status: completed or by name to limit which docs are queried.
Replace Telegram with another interface - Swap the Telegram Trigger nodes for a Webhook or Form Trigger if you want to build a web-based version instead.

📦 About PageIndex

PageIndex is an open-source vectorless RAG framework by VectifyAI. It powers the Mafin 2.5 financial assistant which achieved 98.7% accuracy on FinanceBench - significantly outperforming GPT-4o (~31%) on document-intensive tasks.

🔧 Technical Notes

PDFs sent via Telegram must be under 20MB (Telegram Bot API limit)
PageIndex document processing typically takes 10-60 seconds depending on PDF size - the first question after upload may take slightly longer if the doc is still being indexed
All indexed documents persist permanently in your PageIndex account and can be reused across sessions without re-uploading

🤝 Need Help?

Feel free to reach out via the n8n Community Forum or check out more automation templates on AppStoneLab Technologies.