Fine-tune LLMs on your own cloud, reliably
VaultLayer runs LLM fine-tuning jobs — LoRA, QLoRA, or full fine-tunes — on your own cloud or on elastic external GPUs, with no changes to your training code and automatic recovery if a GPU is interrupted.
Your framework, unchanged
VaultLayer wraps the command you already run. PyTorch, Hugging Face Transformers/TRL/Accelerate/PEFT, JAX/Flax, PyTorch Lightning, DeepSpeed, and Axolotl all work — if it runs with python train.py, it runs on VaultLayer:
vl run python train.py
vl run --gpu H100 python train.py # pin a GPU class
vl run --train-mode qlora --model-params 13 python train.py
The default training image ships PyTorch, CUDA, Transformers, Accelerate, PEFT, and TRL, so most fine-tuning scripts run with no extra setup.
Jobs that finish
Fine-tunes are long enough that a single interruption is expensive. VaultLayer checkpoints as the run progresses and resumes from the last step if the host fails — so a reclaimed GPU costs you minutes, not the whole job.
Your compute, plus overflow
Run on your own cloud or GPU contract first, then add external GPU capacity (H100, A100, L40S, A10G, RTX 4090, and more) through the same CLI when you need supply for a sweep or an urgent run. Each job is isolated on its own provisioned GPU with job-scoped credentials.
Frequently asked questions
Which fine-tuning libraries are supported?
PyTorch, Hugging Face Transformers/TRL/Accelerate/PEFT, JAX/Flax, PyTorch Lightning, DeepSpeed, and Axolotl. Anything that runs with python train.py works.
Can I run LoRA, QLoRA, and full fine-tunes?
Yes. Use --train-mode qlora|lora|full, and pair with --model-params to size the GPU for your model. VaultLayer is built for training and fine-tuning, not real-time inference serving.
Keep every training job moving.
VaultLayer is in invite-only early access for teams running real GPU workloads.
Get early access