Architecture

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) solves the "hallucination" and "stale knowledge" problems inherent in frozen LLM weights. When a user asks a question, a retriever queries a Vector Database for relevant document chunks. These chunks are injected directly into the LLM's context window. The LLM then synthesizes an answer based strictly on the retrieved context rather than relying on its parametric memory.