ML & Fine-tuning

What Is Fine-Tuning? When to Customise an LLM

Fine-tuning is the process of further training a pre-trained large language model on a smaller, task-specific dataset so it adapts to a particular style, domain, or behaviour. It adjusts the model's weights — unlike prompting or RAG — making the new behaviour intrinsic to the model rather than supplied at query time.

Dishant Sethi ·Updated Jun 20, 2026

How does fine-tuning work?

Fine-tuning takes a model that already understands language and trains it further on examples of the specific behaviour you want. The model has seen the world in pre-training; fine-tuning teaches it your task.

In practice you assemble a dataset of input–output pairs that demonstrate the target behaviour — questions and ideal answers, prompts and correctly formatted responses. You then run additional training so the model's weights shift toward producing those outputs. The result is a model that handles your task more reliably and consistently than the base model with prompting alone, because the behaviour is now baked into its weights rather than coaxed out at runtime.

Modern fine-tuning often uses efficient methods like LoRA, which adjust a small set of added parameters instead of the full model — cutting the compute and cost of customisation substantially.

Fine-tuning vs RAG vs prompting

These three are the main ways to customise an LLM, and they are not mutually exclusive.

MethodChangesBest for
PromptingNothing — instructions onlyQuick iteration, general tasks
RAGKnowledge available at query timeCurrent, factual, changing data
Fine-tuningThe model's weightsConsistent style, format, domain behaviour

The rule of thumb: start with prompting, add RAG when you need external knowledge, and fine-tune when you need consistent behaviour that prompting can't reliably produce. A production system often uses all three — a fine-tuned model, grounded with RAG, steered by a good prompt.

When is fine-tuning worth it?

Fine-tuning is worth the effort when prompting has hit its ceiling: the model can do the task but not consistently, or the prompts have grown long and brittle. It's also the right tool when you need a smaller, cheaper model to match a larger one on a narrow task — the basis of model distillation.

Prodinit used this at scale for a high-volume voice AI platform, fine-tuning GPT-4o-mini on production examples so it could replace the much larger GPT-4.1 on the platform's conversational task — cutting inference cost 70% while holding quality steady across 10,000+ calls per day.

Frequently Asked Questions

Fine-tuning changes the model's weights to adjust how it behaves; RAG supplies external information at query time to change what it knows. Fine-tuning is best for consistent style or domain behaviour, RAG for current or changing facts. They are complementary — many systems fine-tune for behaviour and use RAG for knowledge.

Fine-tune when prompting can't produce consistent results, when prompts have become long and fragile, or when you need a smaller, cheaper model to match a larger one on a specific task. If a well-written prompt already gives reliable results, fine-tuning usually isn't worth the added cost and complexity.

Less than people expect. Efficient methods like LoRA and task-narrow datasets mean useful fine-tunes can come from thousands to tens of thousands of high-quality examples rather than millions. Quality and relevance of the examples matter far more than raw volume — examples drawn from real production traffic tend to work best.

Yes. Distillation uses fine-tuning as its mechanism: you fine-tune a smaller student model on a larger teacher model's outputs so it reproduces the teacher's quality at lower cost. All distillation involves fine-tuning, but fine-tuning has broader uses than distillation alone.

How Prodinit does this in productionHow we fine-tuned GPT-4o-mini to match a far larger model at 70% lower cost Read the case study

Stay ahead in AI engineering.

Get the latest insights on building production AI systems, be the first to explore approaches that actually work beyond the demo.

Start a Project →