How does RAG work?
RAG adds a retrieval step in front of the language model. Instead of answering from memory alone, the system first fetches information relevant to the question, then asks the model to answer using that information.
The flow has four stages. First, your source content is split into chunks and converted into embeddings — numerical representations of meaning — stored in a vector database or queried directly from a structured source. Second, when a user asks a question, the question is embedded and used to retrieve the most relevant chunks. Third, those chunks are inserted into the model's prompt as context. Fourth, the model generates an answer grounded in that retrieved context rather than its training data alone.
The payoff is accuracy on information the model never saw in training — your internal documents, a live database, last week's data — without retraining the model itself.
RAG vs fine-tuning: which do you need?
RAG and fine-tuning are often presented as alternatives, but they solve different problems and are frequently combined.
| RAG | Fine-tuning | |
|---|---|---|
| Changes | What the model knows at query time | How the model behaves |
| Best for | Current, factual, changing information | Consistent style, format, domain behaviour |
| Update cost | Add/update documents — instant | Retrain the model |
| Risk | Retrieval misses relevant context | Stale knowledge baked into weights |
Use RAG when answers depend on information that changes or is too large to memorise. Use fine-tuning when you need consistent behaviour or tone. Many production systems do both: fine-tune for behaviour, RAG for knowledge.
When should you use RAG?
RAG is the right choice whenever correct answers depend on specific, current, or proprietary information rather than general knowledge. Typical cases include answering from internal documentation, querying live business data, customer support over a product knowledge base, and any domain where a wrong-but-confident answer is costly.
Prodinit applied this pattern for a digital health research company, building a natural-language-to-SQL layer over a live PostgreSQL database of clinical trial data. Non-technical stakeholders could ask questions like "which site has the highest screen-failure rate in women under 40?" and get instant, grounded answers — retrieving from real data rather than relying on a model's training.