VaultLayer › Learn

How much GPU memory do you need to fine-tune an LLM?

VRAM is usually the deciding factor in fine-tuning, and it swings enormously with method. QLoRA can fit a 7B model on a 24 GB card, while a full fine-tune of the same model can need multiple 80 GB GPUs. Here's approximate guidance by model size and method.

Approximate VRAM by model size and method

Model	QLoRA	LoRA	Full fine-tune
7B	~12–24 GB	~24 GB	1–2× 80 GB
13B	~24 GB	~24–48 GB	multi-GPU (80 GB class)
70B	~48–80 GB	multi-GPU	large multi-GPU cluster

These are rough starting points, not guarantees — actual usage depends on sequence length, batch size, optimizer, and gradient checkpointing.

What drives the number

Method first. Full fine-tuning stores gradients and optimizer state for every parameter; LoRA/QLoRA only for the adapters — that's the biggest swing. See QLoRA vs LoRA vs full.
Sequence length and batch size drive activation memory — long context or big batches push VRAM up fast.
Gradient checkpointing trades compute for memory and can bring a job under a VRAM ceiling.

Sizing a GPU on VaultLayer

Run vl gpus to see available GPU types with their VRAM, and pass --model-params 13 so VaultLayer sizes the GPU for a 13B model. For how the GPU classes compare, see GPU types for training.

Frequently asked questions

Can I fine-tune a 7B model on a 24 GB GPU?

Usually yes with QLoRA, and often with LoRA at modest sequence lengths. A full fine-tune of a 7B typically needs far more — commonly one or two 80 GB GPUs.

How much VRAM to fine-tune a 70B model?

QLoRA brings a 70B into roughly the 48–80 GB range; LoRA and full fine-tuning need multiple GPUs. Exact numbers depend on sequence length and batch size.

Keep every training job moving.

VaultLayer is in invite-only early access for teams running real GPU workloads.

Get early access