How much GPU memory do you need to fine-tune an LLM?
VRAM is usually the deciding factor in fine-tuning, and it swings enormously with method. QLoRA can fit a 7B model on a 24 GB card, while a full fine-tune of the same model can need multiple 80 GB GPUs. Here's approximate guidance by model size and method.
Approximate VRAM by model size and method
| Model | QLoRA | LoRA | Full fine-tune |
|---|---|---|---|
| 7B | ~12–24 GB | ~24 GB | 1–2× 80 GB |
| 13B | ~24 GB | ~24–48 GB | multi-GPU (80 GB class) |
| 70B | ~48–80 GB | multi-GPU | large multi-GPU cluster |
These are rough starting points, not guarantees — actual usage depends on sequence length, batch size, optimizer, and gradient checkpointing.
What drives the number
- Method first. Full fine-tuning stores gradients and optimizer state for every parameter; LoRA/QLoRA only for the adapters — that's the biggest swing. See QLoRA vs LoRA vs full.
- Sequence length and batch size drive activation memory — long context or big batches push VRAM up fast.
- Gradient checkpointing trades compute for memory and can bring a job under a VRAM ceiling.
Sizing a GPU on VaultLayer
Run vl gpus to see available GPU types with their VRAM, and pass --model-params 13 so VaultLayer sizes the GPU for a 13B model. For how the GPU classes compare, see GPU types for training.
Frequently asked questions
Can I fine-tune a 7B model on a 24 GB GPU?
Usually yes with QLoRA, and often with LoRA at modest sequence lengths. A full fine-tune of a 7B typically needs far more — commonly one or two 80 GB GPUs.
How much VRAM to fine-tune a 70B model?
QLoRA brings a 70B into roughly the 48–80 GB range; LoRA and full fine-tuning need multiple GPUs. Exact numbers depend on sequence length and batch size.
Keep every training job moving.
VaultLayer is in invite-only early access for teams running real GPU workloads.
Get early access