Document Chat (RAG)
Upload PDFs, ingest them into a local vector store, and chat with your documents using retrieval-augmented generation.
How RAG Works
Retrieval-Augmented Generation (RAG) combines the power of semantic search with LLM reasoning. Documents are chunked, embedded, and stored in a vector database. When you ask a question, the system retrieves the most relevant chunks and uses them to ground the AI's response.
PDF Ingestion
Upload and parse PDF documents with text extraction.
Semantic Chunking
Break documents into meaningful segments with overlap.
Vector Storage
Embed chunks using OpenAI or local models and store in MongoDB.
Hybrid Search
Combine keyword and semantic search for best retrieval.
Document Ingestion API
Use the REST API to upload documents and trigger the ingestion pipeline.
curl -X POST http://localhost:5000/api/v1/documents/upload \
-F "file=@research_paper.pdf" \
-F "metadata={\"title\":\"AI Research\",\"tags\":[\"ai\",\"ml\"]}"
# Response:
{
"document_id": "doc_abc123",
"chunks_created": 45,
"status": "indexed"
}Querying Documents
Ask natural language questions and get answers grounded in your uploaded documents with source citations.
{
"query": "What are the key findings about neural networks?",
"document_ids": ["doc_abc123"],
"top_k": 5
}
# Response:
{
"answer": "The key findings indicate that...",
"sources": [
{ "chunk_id": "chunk_12", "page": 3, "relevance": 0.92 },
{ "chunk_id": "chunk_45", "page": 8, "relevance": 0.88 }
]
}Privacy & Local Storage
All document embeddings and vector data are stored locally in MongoDB. No data is sent to external services unless you explicitly configure an external embedding API. You can use local embedding models for complete offline operation.