Fine-tuning
Fine-tuning is the process of taking a pretrained model and continuing to train it on a target dataset, so its weights adapt to a specific task, domain, style, or output format. It is the standard way to customise a model when prompting alone is not enough.
Variants
- Full fine-tuning. Updates every parameter. Maximum capacity, maximum cost.
- LoRA (Low-Rank Adaptation). Trains a small low-rank update on top of frozen weights. Fast, cheap, and the result is a small adapter file.
- QLoRA. LoRA on a quantised base model, enabling fine-tuning of large models on a single GPU.
- Instruction tuning. Fine-tuning on instruction-response pairs to make a base model follow instructions.
- RLHF and DPO. Aligning model outputs with human preferences, used in modern chat models.
🔗