VaultLayer › Learn

QLoRA vs LoRA vs full fine-tuning

These three fine-tuning methods trade GPU memory against flexibility. Full fine-tuning updates every weight, LoRA trains small adapters on a frozen model, and QLoRA adds 4-bit quantization so large models fit on small GPUs. Here's how to choose.

The three methods compared

MethodWhat it doesGPU memory
Full fine-tuneUpdates all model weightsHighest — weights, gradients, and optimizer state for every parameter
LoRATrains small low-rank adapters on a frozen baseModerate — only the adapters have gradients/optimizer state
QLoRALoRA on a 4-bit quantized baseLowest — quantization shrinks the frozen weights sharply

How to choose

For the actual VRAM numbers by model size, see how much GPU memory to fine-tune an LLM.

Running each on VaultLayer

Pick the method with one flag: vl run --train-mode qlora|lora|full python train.py, and pair it with --model-params to size the GPU for your model. See fine-tune LLMs on your own cloud for the full workflow.

Frequently asked questions

Is QLoRA worse than full fine-tuning?

For many tasks the quality gap is small, and QLoRA makes large models trainable on limited hardware. Full fine-tuning can still win when you need maximum quality or are reshaping the model substantially.

Which should I start with?

QLoRA or LoRA are the usual starting points — cheaper, faster, and enough for most fine-tunes. Move to a full fine-tune only if the adapter approach leaves quality on the table.

Keep every training job moving.

VaultLayer is in invite-only early access for teams running real GPU workloads.

Get early access