VaultLayer › Learn

How much GPU memory do you need to fine-tune an LLM?

VRAM is usually the deciding factor in fine-tuning, and it swings enormously with method. QLoRA can fit a 7B model on a 24 GB card, while a full fine-tune of the same model can need multiple 80 GB GPUs. Here's approximate guidance by model size and method.

Approximate VRAM by model size and method

ModelQLoRALoRAFull fine-tune
7B~12–24 GB~24 GB1–2× 80 GB
13B~24 GB~24–48 GBmulti-GPU (80 GB class)
70B~48–80 GBmulti-GPUlarge multi-GPU cluster

These are rough starting points, not guarantees — actual usage depends on sequence length, batch size, optimizer, and gradient checkpointing.

What drives the number

Sizing a GPU on VaultLayer

Run vl gpus to see available GPU types with their VRAM, and pass --model-params 13 so VaultLayer sizes the GPU for a 13B model. For how the GPU classes compare, see GPU types for training.

Frequently asked questions

Can I fine-tune a 7B model on a 24 GB GPU?

Usually yes with QLoRA, and often with LoRA at modest sequence lengths. A full fine-tune of a 7B typically needs far more — commonly one or two 80 GB GPUs.

How much VRAM to fine-tune a 70B model?

QLoRA brings a 70B into roughly the 48–80 GB range; LoRA and full fine-tuning need multiple GPUs. Exact numbers depend on sequence length and batch size.

Keep every training job moving.

VaultLayer is in invite-only early access for teams running real GPU workloads.

Get early access