Fine-tuning

Fine-tuning is the process of taking a pretrained model and continuing to train it on a target dataset, so its weights adapt to a specific task, domain, style, or output format. It is the standard way to customise a model when prompting alone is not enough.

Variants

  • Full fine-tuning. Updates every parameter. Maximum capacity, maximum cost.
  • LoRA (Low-Rank Adaptation). Trains a small low-rank update on top of frozen weights. Fast, cheap, and the result is a small adapter file.
  • QLoRA. LoRA on a quantised base model, enabling fine-tuning of large models on a single GPU.
  • Instruction tuning. Fine-tuning on instruction-response pairs to make a base model follow instructions.
  • RLHF and DPO. Aligning model outputs with human preferences, used in modern chat models.
🔗
Related Terms
RAG, Embeddings.

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon