Spot GPU training that actually finishes
Spot and interruptible GPUs are the cheapest compute there is, because the provider can take them back at any moment. That's a fair trade only if a reclaim costs you minutes — not the run. VaultLayer adds the reliability layer that makes spot capacity safe for real training.
The spot trade-off, honestly
Spot pricing exists because you absorb interruption risk. Unprotected, that risk compounds: every preemption restarts you from zero, so long jobs may never finish and you pay for the same GPU-hours repeatedly. Protected — durable checkpoints plus automatic resume — a preemption costs the minutes since the last save, and the discount is real.
How VaultLayer protects spot runs
- Automatic checkpointing to durable storage as the job runs — see how checkpoint & resume works.
- Reclaim detection and auto-resume from the last complete step, no babysitting.
- Cross-provider failover — if a GPU class is being reclaimed hard, resume elsewhere instead of re-queuing into the same shortage. See when a job keeps getting preempted.
- Machine exclusion — hosts that failed a job are avoided on re-provision.
When to use spot vs on-demand
Spot fits long, checkpointable training and fine-tuning — the discount compounds over hours. On-demand fits short, deadline-critical runs where interruption risk isn't worth managing. With BYOC, both run through the same vl run command on your own account.
Frequently asked questions
Is spot GPU training safe for multi-hour fine-tunes?
Yes, if checkpoint-and-resume is in place — a reclaim then costs minutes of progress. Without it, long spot runs are a gamble that gets worse the longer the job.
Do I need to change my code to survive preemptions?
On VaultLayer, no — vl run wraps your existing script, checkpoints as it runs, and resumes automatically. The optional resume snippet is usually auto-inserted on first run.
Keep every training job moving.
Sign up, install the CLI, and submit your first training job in minutes — on your own cloud or elastic GPU capacity.
Sign up