Inferencing
Serverless Models
Model catalog for managed, on-demand inference on Qubrid AI.
Qubrid serverless inference gives you immediate access to hosted models over a standard API. There is no infrastructure to deploy: you send requests, and Qubrid runs the model on shared capacity. Billing is usage-based-typically per token for text models-so cost scales with actual consumption rather than reserved capacity.
Shared endpoints apply fair-use rate limits. They are a strong fit for development, benchmarks, and workloads with unpredictable volume. If you need predictable latency, sustained throughput, or private hardware, deploy on GPU Instances or GPU Clusters instead.
The serverless catalog and self-managed GPU deployments do not always expose the same model lineup. Confirm model IDs in the tables below, then validate availability and pricing in the playground before shipping to production.
Pricing
Charges are metered from actual usage. There are no upfront commitments, idle fees, or cluster management overhead for serverless calls. Each table lists the API identifier and context window; list prices per million input tokens are shown where applicable.
For asynchronous or high-volume pipelines, ask support about batch pricing and enterprise rate cards.
Models
Jump to a category:
Text models
| Organization | Model name | API model string | Context length | Input pricing (per 1M tokens) | Cached input pricing (per 1M tokens) | Output pricing (per 1M tokens) | Quantization |
|---|---|---|---|---|---|---|---|
| Qwen | Qwen3.7 Max | 1,000,000 | $2.50 | - | $7.50 | - | |
| Z.ai | GLM 5 | 202,752 | $0.57 | $0.12 | $0.58 | FP4 | |
| DeepSeek | DeepSeek V4 Flash | 1,000,000 | $0.14 | $0.028 | $0.28 | FP4 | |
| DeepSeek | DeepSeek V4 Pro | 1,000,000 | $1.65 | $0.14 | $3.30 | FP4 | |
| NVIDIA | Nemotron 3 Nano Omni | 256,000 | $0.069 | - | $0.28 | BF16 | |
| Qwen | Qwen3.6 Max Preview | 262,144 | $1.30 | - | $7.80 | - | |
| DeepSeek | DeepSeek V3.2 | 163,000 | $0.29 | $0.058 | $0.43 | - | |
| Z.ai | GLM 4.7 | 202,752 | $0.43 | $0.086 | $2.01 | FP4 | |
| OpenAI | GPT OSS 120B | 128,000 | $0.12 | $0.012 | $0.48 | MXFP4 | |
| MiniMax | MiniMax M2.7 | 204,800 | $0.30 | $0.06 | $1.20 | FP4 | |
| NVIDIA | NVIDIA Nemotron 3 Super 120B A12B | 1,000,000 | $0.35 | - | $1.04 | FP8 | |
| MiniMax | MiniMax M2.5 | 196,608 | $0.30 | $0.061 | $1.21 | - | |
| NVIDIA | NVIDIA Nemotron 3 Nano 30B A3B | 128,000 | $0.040 | $0.016 | $0.16 | BF16 | |
| Moonshot AI | Kimi K2 Thinking | 262,144 | $0.57 | $0.12 | $2.29 | FP4 | |
| Qwen | Qwen3 Next 80B A3B Thinking | 262,144 | $0.15 | - | $1.20 | - | |
| Moonshot AI | Kimi K2 Instruct | 128,000 | $0.57 | $0.12 | $2.29 | - | |
| DeepSeek | DeepSeek R1 0528 | 163,840 | $0.57 | $0.057 | $2.29 | - | |
| Qwen | Qwen3 Max | 256,000 | $1.20 | - | $6.00 | - | |
| Microsoft | Fara 7B | 128,000 | $0.17 | - | $0.20 | - | |
| DeepSeek | DeepSeek R1 Distill Llama 70B | 131,100 | $0.56 | - | $0.64 | - | |
| DeepSeek | DeepSeek V3 | 128,000 | $0.29 | $0.057 | $1.15 | - | |
| MistralAI | Mistral 7B Instruct v0.3 | 32,768 | $0.088 | - | $0.15 | - | |
| Meta | Llama 3.3 70B Instruct | 128,000 | $0.096 | - | $0.30 | - |
Code models
| Organization | Model name | API model string | Context length | Input pricing (per 1M tokens) | Cached input pricing (per 1M tokens) | Output pricing (per 1M tokens) | Quantization |
|---|---|---|---|---|---|---|---|
| Qwen | Qwen3 Coder Plus | 262,000 | $1.00 | $0.20 | $5.00 | - | |
| Qwen | Qwen3 Coder Next | 262,144 | $0.30 | - | $1.50 | - | |
| Qwen | Qwen3 Coder 30B A3B Instruct | 262,114 | $0.45 | - | $2.25 | - | |
| Qwen | Qwen3 Coder 480B A35B Instruct | 262,114 | $1.50 | - | $7.50 | - | |
| Qwen | Qwen3 Coder Flash | 262,114 | $0.30 | $0.060 | $1.50 | - |
Vision models
| Organization | Model name | API model string | Context length | Input pricing (per 1M tokens) | Cached input pricing (per 1M tokens) | Output pricing (per 1M tokens) | Quantization |
|---|---|---|---|---|---|---|---|
| MiniMax | MiniMax M3 | 1,000,000 | $0.30 | $0.060 | $1.20 | - | |
| Qwen | Qwen3.7 Plus | 256,000 | $0.40 | - | $1.60 | - | |
| Qwen | Qwen3.6 Plus | 256,000 | $0.50 | - | $3.00 | - | |
| Moonshot AI | Kimi K2.5 | 256,000 | $0.57 | $0.12 | $3.01 | - | |
| Moonshot AI | Kimi K2.6 | 256,000 | $0.89 | $0.18 | $3.71 | - | |
| Qwen | Qwen3.6 27B | 256,000 | $0.60 | - | $3.60 | - | |
| Qwen | Qwen3.6 35B A3B | 256,000 | $0.25 | - | $1.49 | - | |
| Qwen | Qwen3.5 122B A10B | 256,000 | $0.40 | - | $3.20 | - | |
| Qwen | Qwen3.5 27B | 256,000 | $0.30 | - | $2.40 | - | |
| Qwen | Qwen3.5 35B A3B | 256,000 | $0.25 | - | $2.00 | - | |
| Qwen | Qwen3.5 397B A17B | 256,000 | $0.60 | - | $3.60 | - | |
| Qwen | Qwen3.5 Flash | 1,000,000 | $0.10 | $0.010 | $0.40 | - | |
| Qwen | Qwen3.5 Plus | 1,000,000 | $0.40 | - | $2.40 | - | |
| Qwen | Qwen3 VL 235B A22B Thinking | 256,000 | $0.40 | - | $4.00 | - | |
| Qwen | Qwen3 VL 235B A22B Instruct | 256,000 | $0.40 | - | $1.60 | - | |
| Qwen | Qwen3 VL 30B A3B Instruct | 256,000 | $0.10 | - | $0.10 | - | |
| Qwen | Qwen3 VL 8B Instruct | 256,000 | $0.064 | - | $0.064 | - | |
| Qwen | Qwen3 VL Flash | 256,000 | $0.050 | $0.010 | $0.40 | - | |
| Qwen | Qwen3 VL Plus | 256,000 | $0.20 | - | $1.60 | - |
OCR models
| Organization | Model name | API model string | Context length | Input pricing (per 1M tokens) | Cached input pricing (per 1M tokens) | Output pricing (per 1M tokens) | Quantization |
|---|---|---|---|---|---|---|---|
| Tencent Hunyuan | HunyuanOCR | 16,000 | $0.21 | - | $0.35 | - |