GPU Compute
On-Demand GPUs
Provision on-demand GPU virtual machines for AI training, inference, and compute workloads
Prerequisites:
- A valid Qubrid AI account logged in on the platform
- Enough credits in your account to provision on-demand GPU virtual machines
On-demand GPUs let you run AI and compute-intensive tasks on high-performance hardware without long-term commitments. Choose a GPU virtual machine from the live pool, pick your template and storage, and launch when capacity is available. Auto-stop and pause options help you control usage and costs.
We don't divide your instances or share them with others. You get access to the entire GPU virtual machine so you can unlock its full potential.
Quick deploy on-demand GPU virtual machines
Head over to the GPU Virtual Machines tab from the left menu
This opens a card view of on-demand GPUs available in the pool.
Select the preferred GPU
Choose from the pool of on-demand GPU VMs available. We have the latest NVIDIA GPUs such as B200, H200, H100, A100, L40S, A10G, T4, and L4. RAM, vCPU, storage, and CUDA details are shown on each card.
Each GPU card shows whether that on-demand instance is currently in the pool:
- Available - you can provision this GPU now
- Unavailable - this GPU is not in the pool at the moment
If the GPU you want is not listed or shows as unavailable, click the Refresh button to reload the list and check whether new instances have become available.
Select an AI/ML Template
By default Ubuntu 22.04 is selected as a template
You can change this by clicking on the Change Template button. Once you click on this a Dialog box will come up, showing the different templates available to choose from. Simply click on the template you want to choose & it will be assigned
Change GPU if you want
As you have already selected the GPU, you can still change that if you want. It can change depending on your needs
Select the number of GPUs needed
You can select number of GPUs from 1,4 or 8 depending on your usecase
The number of GPUs might not always be available in 1 or 4. It depends on availability. If you need something that is not listed or a custom number of GPUs, contact us via Support. vCPU cores and RAM are populated automatically.
Select Root Disk Storage
Choose from the dropdown storage options starting from 100 GB till 2 TB
Root Disk Storage is billed at 10 Cents Per GB Per month & is charged even when the instance is stopped. Example: 100 GB costs $10/month
Select your Interface
You can access the GPU VM either via SSH or via Jupyter. To know how to generate a SSH Key, visit our Documentations. If you want to use Jupyter, provide your Jupyter Authentication Token
Configure Auto Stop
This helps you automatically stop your instance after a specific amount of time as defined by you. Click on the dropdown and select the number of hours you want to set auto stop to. If you don't want to use this feature, let it remain to the default value of Never
Review your Instance
The right part of the dashboard keeps updating with every option change. Review it finally once to check if the instance is as per your needs
Select the Commitment Period
Choose either from On-Demand Pool or commit for more discounts.
If you select any option other than On-Demand, a request will be sent to our team and someone from our team will reach out to provision the instance for you. Selecting On-Demand gives you instant access when the GPU is marked Available in the pool.
Click on Launch
Launch your On-Demand GPU or put in a request for longer terms
Submit your Request (If Longer Commit)
Review your selection & add a note if you have any message for our Platform team.
You can also click on the Reset button in case you want to start all over again.
Choosing the Right GPU
Different workloads require different levels of performance, memory, and cost efficiency. Below is guidance on when to choose each GPU type:
NVIDIA B200 (180 GB)
- Best for: Next-gen large-scale AI model training and high-throughput inference.
- Why: Highest VRAM capacity with extreme bandwidth, optimized for cutting-edge foundation models and multi-trillion parameter research.
NVIDIA H200 (141 GB)
- Best for: Large LLMs, enterprise-scale training, and memory-intensive inference.
- Why: Higher memory than H100 with strong bandwidth, designed for advanced generative AI workloads.
NVIDIA H100 (80 GB)
- Best for: High-performance model training, fine-tuning, and distributed workloads.
- Why: Current industry standard for large model training; excellent tensor performance.
NVIDIA A100 (80 GB / 40 GB)
- Best for: Training and inference at scale; versatile for research and production.
- Why: Proven workhorse for AI/ML; available in 40 GB and 80 GB VRAM options depending on dataset/model size.
NVIDIA L40S (48 GB)
- Best for: Balanced training, inference, and AI-enhanced graphics workloads.
- Why: Strong GPU compute with large memory; good middle ground for enterprises running mixed AI + visualization tasks.
NVIDIA A10G (24 GB)
- Best for: Medium-scale training, fine-tuning, and inference for open-source models.
- Why: Cost-efficient GPU for developers and teams experimenting with models up to mid-range size.
NVIDIA T4 (16 GB)
- Best for: Lightweight inference, prototyping, and smaller-scale AI services.
- Why: Low-cost, energy-efficient option; good for deploying chatbots, RAG pipelines, or small LLMs.
NVIDIA L4 (24 GB)
- Best for: Cloud inference, AI-powered video, and general-purpose ML tasks.
- Why: Modern upgrade over T4 with more memory and stronger inference throughput.
For research & cutting-edge models: B200, H200, H100. For enterprise training/inference balance: A100, L40S. For developers & startups (cost-efficient): A10G, L4, T4
Root Disk (GB)
The Root Disk is the primary storage attached to your GPU instance. It holds the operating system, dependencies, and any data you store locally.
Key Points
- Default Size: Each instance comes with a default root disk (e.g., 100 GB).
- Customizable: You can increase disk size at deployment time to accommodate datasets, models, or logs.
- Persistent Billing: Root disk storage is billed at $0.10 per GB per month and is charged even if the instance is stopped.
- Example: A 100 GB disk costs $10 per month.
When to Increase Root Disk
- Training large models that require big datasets stored locally.
- Running workflows that generate heavy intermediate files or logs.
- Deploying multiple frameworks or custom libraries on the same instance.
When to Keep It Minimal
- Using external object storage or mounted volumes for datasets.
- Running lightweight inference or stateless applications.
- Optimizing costs when persistent local storage is not required.
Pausing on-demand GPU virtual machines
You can pause or stop on-demand GPU virtual machines when they are not in use to reduce costs.
Key Points
- No Compute Charges: When an instance is paused, you are not charged for GPU compute.
- Storage Charges Continue: While paused, you will still be charged for the root disk and any attached storage.
- Example:
- If you pause a GPU instance with a 100 GB root disk, compute costs stop immediately.
- Storage charges continue at $10/month until you delete the disk.
Pause instances when not running jobs to save on compute costs. Delete unused disks or move data to external object storage if long-term retention is not required.
Auto Stop
The Auto Stop feature lets you automatically shut down on-demand GPU virtual machines after a specified period of inactivity or based on a timer you define.
Key Points
- Automatic Shutdown: Instances will automatically stop after the configured time limit.
- User Defined: You can set the time (e.g., 1 hour, 6 hours, 24 hours) based on your workflow.
- Save costs: Prevents on-demand GPU VMs from running idle and accumulating unnecessary compute charges.
- Storage Charges Remain: When an instance is auto-stopped, compute charges stop, but root disk and attached storage charges continue.
Example
- You launch a GPU instance with a 6-hour Auto Stop setting.
- After 6 hours, the instance automatically shuts down if still running.
- Compute charges end immediately, but storage continues to be billed.
Always enable Auto Stop for experiments, prototyping, or jobs with predictable runtimes. Use manual control (Pause/Resume) for production workloads that need to stay online.