Inferencing

Serverless Models

Model catalog for managed, on-demand inference on Qubrid AI.

Qubrid serverless inference gives you immediate access to hosted models over a standard API. There is no infrastructure to deploy: you send requests, and Qubrid runs the model on shared capacity. Billing is usage-based-typically per token for text models-so cost scales with actual consumption rather than reserved capacity.

Shared endpoints apply fair-use rate limits. They are a strong fit for development, benchmarks, and workloads with unpredictable volume. If you need predictable latency, sustained throughput, or private hardware, deploy on GPU Instances or GPU Clusters instead.

The serverless catalog and self-managed GPU deployments do not always expose the same model lineup. Confirm model IDs in the tables below, then validate availability and pricing in the playground before shipping to production.

Pricing

Charges are metered from actual usage. There are no upfront commitments, idle fees, or cluster management overhead for serverless calls. Each table lists the API identifier and context window; list prices per million input tokens are shown where applicable.

For asynchronous or high-volume pipelines, ask support about batch pricing and enterprise rate cards.

Models

Jump to a category:

Text models

Organization
 
Model name
 
API model string
 
Context
length
Input pricing
(per 1M tokens)
Cached input pricing
(per 1M tokens)
Output pricing
(per 1M tokens)
Quantization
 
QwenQwen3.7 Max1,000,000$2.50-$7.50-
Z.aiGLM 5202,752$0.57$0.12$0.58FP4
DeepSeekDeepSeek V4 Flash1,000,000$0.14$0.028$0.28FP4
DeepSeekDeepSeek V4 Pro1,000,000$1.65$0.14$3.30FP4
NVIDIANemotron 3 Nano Omni256,000$0.069-$0.28BF16
QwenQwen3.6 Max Preview262,144$1.30-$7.80-
DeepSeekDeepSeek V3.2163,000$0.29$0.058$0.43-
Z.aiGLM 4.7202,752$0.43$0.086$2.01FP4
OpenAIGPT OSS 120B128,000$0.12$0.012$0.48MXFP4
MiniMaxMiniMax M2.7204,800$0.30$0.06$1.20FP4
NVIDIANVIDIA Nemotron 3 Super 120B A12B1,000,000$0.35-$1.04FP8
MiniMaxMiniMax M2.5196,608$0.30$0.061$1.21-
NVIDIANVIDIA Nemotron 3 Nano 30B A3B128,000$0.040$0.016$0.16BF16
Moonshot AIKimi K2 Thinking262,144$0.57$0.12$2.29FP4
QwenQwen3 Next 80B A3B Thinking262,144$0.15-$1.20-
Moonshot AIKimi K2 Instruct128,000$0.57$0.12$2.29-
DeepSeekDeepSeek R1 0528163,840$0.57$0.057$2.29-
QwenQwen3 Max256,000$1.20-$6.00-
MicrosoftFara 7B128,000$0.17-$0.20-
DeepSeekDeepSeek R1 Distill Llama 70B131,100$0.56-$0.64-
DeepSeekDeepSeek V3128,000$0.29$0.057$1.15-
MistralAIMistral 7B Instruct v0.332,768$0.088-$0.15-
MetaLlama 3.3 70B Instruct128,000$0.096-$0.30-

Code models

Organization
 
Model name
 
API model string
 
Context
length
Input pricing
(per 1M tokens)
Cached input pricing
(per 1M tokens)
Output pricing
(per 1M tokens)
Quantization
 
QwenQwen3 Coder Plus262,000$1.00$0.20$5.00-
QwenQwen3 Coder Next262,144$0.30-$1.50-
QwenQwen3 Coder 30B A3B Instruct262,114$0.45-$2.25-
QwenQwen3 Coder 480B A35B Instruct262,114$1.50-$7.50-
QwenQwen3 Coder Flash262,114$0.30$0.060$1.50-

Vision models

Organization
 
Model name
 
API model string
 
Context
length
Input pricing
(per 1M tokens)
Cached input pricing
(per 1M tokens)
Output pricing
(per 1M tokens)
Quantization
 
MiniMaxMiniMax M31,000,000$0.30$0.060$1.20-
QwenQwen3.7 Plus256,000$0.40-$1.60-
QwenQwen3.6 Plus256,000$0.50-$3.00-
Moonshot AIKimi K2.5256,000$0.57$0.12$3.01-
Moonshot AIKimi K2.6256,000$0.89$0.18$3.71-
QwenQwen3.6 27B256,000$0.60-$3.60-
QwenQwen3.6 35B A3B256,000$0.25-$1.49-
QwenQwen3.5 122B A10B256,000$0.40-$3.20-
QwenQwen3.5 27B256,000$0.30-$2.40-
QwenQwen3.5 35B A3B256,000$0.25-$2.00-
QwenQwen3.5 397B A17B256,000$0.60-$3.60-
QwenQwen3.5 Flash1,000,000$0.10$0.010$0.40-
QwenQwen3.5 Plus1,000,000$0.40-$2.40-
QwenQwen3 VL 235B A22B Thinking256,000$0.40-$4.00-
QwenQwen3 VL 235B A22B Instruct256,000$0.40-$1.60-
QwenQwen3 VL 30B A3B Instruct256,000$0.10-$0.10-
QwenQwen3 VL 8B Instruct256,000$0.064-$0.064-
QwenQwen3 VL Flash256,000$0.050$0.010$0.40-
QwenQwen3 VL Plus256,000$0.20-$1.60-

OCR models

Organization
 
Model name
 
API model string
 
Context
length
Input pricing
(per 1M tokens)
Cached input pricing
(per 1M tokens)
Output pricing
(per 1M tokens)
Quantization
 
Tencent HunyuanHunyuanOCR16,000$0.21-$0.35-