Five commands from zero to a finished training run.
pip install vaultlayer
vaultlayer init # one-time authentication
vl run python train.py # submit your training script
vl logs <job_id> --follow # stream logs as the job runs
vl credits # check your balance
Submit any Python training script — the lowest-cost available GPU wins.
vl run python train.py | Submit your script — defaults to the lowest-cost available GPU. |
vl run --gpu H100 python train.py | Pin a specific GPU type (H100, A100, L40S, L4, A10G, RTX 4090, etc.). |
vl run --project nlp python train.py | Tag the run with a project — shows up in vl spend. |
vl run --experiment sft-llama7b python train.py | Tag the run with an experiment name for per-run cost attribution. |
vl run --data s3://bucket/path python train.py | Point the job at your dataset. Supports s3://, gs://, az://, hf:// (Hugging Face), and r2:// (a dataset you uploaded with vl sync). Cloud sources are mirrored once before training starts. See Storage. |
vl run --image your-org/your-image:tag python train.py | Use a custom container image instead of the default training base image. |
vl run --env KEY=VALUE python train.py | Pass an environment variable through to the remote job. Repeat the flag for multiple values. |
vl run --accept-interruptible python train.py | Skip the pre-flight warning when picking an interruptible (consumer-tier) GPU. |
vl run --verbose python train.py | Show extra status output during the run. |
vl run --quiet python train.py | Suppress all output except the final job result. |
vl run --keep-alive 30m python train.py | Hold the instance alive for 30 min after the script exits — pull artifacts, inspect logs while the host is warm, or fix and re-try. Range: 5m–24h, billed at the job's GPU rate. See Keep alive. |
vl run --help to see every available option.requirements.txt next to your script (auto-installed before training), or bring your own image with --image.Status, logs, GPU health, recent jobs — all from the CLI.
vl status <job_id> | One-shot snapshot of the job's current state. |
vl logs <job_id> | Show recent log output from your script. |
vl logs <job_id> --tail 200 | Show the last 200 lines. |
vl logs <job_id> --follow | Stream log output live as it lands. |
vl gpu-stats <job_id> | Live GPU VRAM, utilization, temperature, and disk usage — useful for tuning batch size. |
vl jobs | Show your job history, most recent first. Jobs in a keep-alive window show a countdown: KEEP_ALIVE (12m04s left). |
vl ps | Show all active and recent jobs (with status). |
The CLI shows the last 15 lines of error output inline on every failure — no extra command needed. For deeper investigation:
vl diagnose <job_id> | One-command post-failure investigation — failure cause, last logs, GPU snapshot, fix suggestions. |
vl logs <job_id> --tail 200 | Pull more error context if vl diagnose isn't enough. |
vl download <job_id> | Download job checkpoints, artifacts, and manifest after a run finishes. |
python -m py_compile train.pyAfter your script exits, the instance is normally torn down so billing stops. Pass --keep-alive to hold it for a debugging window — inspect logs while the host is still warm, pull artifacts, or fix and re-try without paying a fresh cold-boot.
vl run --keep-alive 30m python train.py | Submit a job that stays up for 30 min after exit. Range: 5m–24h, billed at the job's GPU rate. |
vl extend <job_id> 20m | Extend the keep-alive window by another 20 min. Caps at 24h total. Not every provider supports mid-window extension — if you hit that, set a longer --keep-alive at submit instead. |
vl terminate <job_id> | End the window early and destroy the instance. Billing stops immediately. Use -y / --yes to skip confirmation. |
vl jobs | Shows a live countdown for jobs still in the window: KEEP_ALIVE (12m04s left). |
vl terminate as soon as you're done — don't let an idle window outlive the value of debugging.vl stop <job_id> | Stop a running job (checkpoints before terminating so you don't lose progress). |
vl restart <job_id> | Restart a suspended or interrupted job from its last checkpoint. |
vl delete-job <job_id> | Delete all saved data for a job. |
See what a job will cost before you submit, and validate your script runs end-to-end affordably.
vl estimate python train.py | Estimate the job cost across available GPU options before submission. |
vl gpus | List available GPU types with VRAM and current best price. |
vl env-check | Validate the remote training environment in ~30 seconds (~$0.04) without submitting a full run. |
vl regions | Query available regions for provisioning. |
vl connect | Connect compute providers or data storage to VaultLayer. |
Two ways to get data to a job: upload local data once with vl sync (then reuse it via r2://), or point straight at cloud storage with --data.
vl sync /path/to/data | Upload a local dataset once. Reuse it on any run with --data r2://<dataset-id>. |
vl upload /path/to/data | Upload a dataset (alias of vl sync). |
vl datasets | List uploaded datasets and their r2:// IDs. |
vl datasets delete <dataset-id> | Delete a dataset. Files are purged within 24h and monthly storage billing stops immediately. |
vl download <job_id> | Download a finished job's checkpoints and artifacts to your machine. |
Bring your own buckets — jobs read your datasets and write checkpoints back to storage you control, instead of going through an upload.
vl connect storage | Interactive setup for your cloud storage credentials (S3-compatible, Google Cloud Storage, or Azure Blob). |
vl run --data s3://bucket/path python train.py | Point a job at connected storage. Schemes: s3://, gs://, az://, hf://, r2://. |
vl connect | Interactive picker — connect storage or compute to VaultLayer. |
vl connect list | Show which accounts are currently connected. |
vl connect test | Verify a connected account's credentials still work. |
vl connect remove | Remove a connected account. |
vl connect anytime to update or replace them.vl credits | Show your current credit balance. |
vl credits buy | Top up your balance — opens a Stripe-hosted checkout in your browser. Credits are added automatically once payment completes. |
vl spend | Spend breakdown by experiment, project, user, or day (last 90 days). |
vl tag <job_id> --project X --experiment Y | Retroactively tag a past job with a project and/or experiment name. |
vaultlayer init --reauth | Re-authenticate if your token expires. |
vl examples | Download ready-to-run example training scripts. |
vl update | Update the VaultLayer CLI to the latest version. |
vl feedback | Submit feedback or a crash report. |
vl --version | Print the installed CLI version. |
vl --help | Show all top-level commands. |
vaultlayer init --reauth and enter your account email. If the same message keeps coming back after a successful reauth, check that VAULTLAYER_TOKEN isn't exported in your shell from a source .env step — run unset VAULTLAYER_TOKEN and retry.
vl status <job_id> to confirm the phase.
vl logs <job_id> --tail 200. To catch import and syntax errors in 1 second locally (before paying for a GPU), run python -m py_compile your_script.py.
vl credits, and confirm the file exists with ls your_script.py.
vl gpus to see VRAM per option. Live VRAM during a run: vl gpu-stats <job_id>.
Email rahuljain@vaultlayer.cloud for anything — bugs, feature requests, or quick questions on how to use a command. We typically reply within a day.
For one-off feedback or a crash report from the CLI, you can also run vl feedback.