Inferencing

Serverless Models

Model catalog for managed, on-demand inference on Qubrid AI.

Qubrid serverless inference gives you immediate access to hosted models over a standard API. There is no infrastructure to deploy: you send requests, and Qubrid runs the model on shared capacity. Billing is usage-based-typically per token for text models-so cost scales with actual consumption rather than reserved capacity.

Shared endpoints apply fair-use rate limits. They are a strong fit for development, benchmarks, and workloads with unpredictable volume. If you need predictable latency, sustained throughput, or private hardware, deploy on On-Demand GPUs or GPU Clusters instead.

The serverless catalog and self-managed GPU deployments do not always expose the same model lineup. Confirm model IDs in the tables below, then validate availability and pricing in the playground before shipping to production.

Pricing

Charges are metered from actual usage. There are no upfront commitments, idle fees, or cluster management overhead for serverless calls. Each table lists the API identifier, context window, and per-million-token prices. Some models use tiered pricing based on input length; the applicable tier is shown in the Pricing tier column.

Input and Output are shown in each table. Hover any row to see thinking, cache, batch, and other billing modes.

Each pricing table includes a short hint above the columns. Hover anywhere on a row to open a breakdown of every billing mode for that model.

For asynchronous or high-volume pipelines, ask support about batch pricing and enterprise rate cards.

Models

Jump to a category:

Text models

Hover any row to see additional billing modes (thinking, cache, batch, etc.).

Organization	Model name	Context length	Pricing tier	Input (per 1M tokens)	Output (per 1M tokens)	Input (Thinking) (per 1M tokens)	Input (Implicit Cache) (per 1M tokens)	Input (Batch File) (per 1M tokens)	Explicit Cache Creation (per 1M tokens)	Explicit Cache Read (per 1M tokens)	Input (Batch Chat) (per 1M tokens)	Output (Batch File) (per 1M tokens)	Output (Batch Chat) (per 1M tokens)	Quantization
Qwen	Qwen3.7 Max	1,000,000	-	$2.5	$7.5	$0.5	$3.125	$0.25	-	-	-	-	-	-
Z.ai	GLM 5	202,752	≤32k tokens	$0.573	$2.58	$0.115	-	-	-	-	-	-	-	FP4
Z.ai	GLM 5	202,752	32k–200k tokens	$0.86	$3.154	$0.172	-	-	-	-	-	-	-	FP4
Z.ai	GLM 5.1	202,752	-	$1.68	$5.28	$0.312	-	-	-	-	-	-	-	FP4
Z.ai	GLM 5.2	202,752	-	$1.1	$3.851	$0.275	-	-	-	-	-	-	-	FP4
Z.ai	GLM 5 Turbo	202,752	-	$1.44	$4.8	$0.288	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.5 Air	128,000	-	$0.24	$1.32	$0.036	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.5 AirX	128,000	-	$1.32	$5.4	$0.264	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.5 X	202,752	-	$2.64	$10.68	$0.54	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.5 Flash	128,000	-	$0	$0	$0	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.7 Flash	128,000	-	$0	$0	$0	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.7 FlashX	128,000	-	$0.084	$0.48	$0.012	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4 32B 0414 128K	128,000	-	$0.12	$0.12	-	-	-	-	-	-	-	-	FP4
DeepSeek	DeepSeek V4 Flash	1,000,000	-	$0.138	$0.275	$0.028	-	-	-	-	-	-	-	FP4
DeepSeek	DeepSeek V4 Pro	1,000,000	-	$1.65	$3.301	$0.138	-	-	-	-	-	-	-	FP4
NVIDIA	Nemotron 3 Nano Omni	256,000	-	$0.069	$0.276	-	-	-	-	-	-	-	-	BF16
Qwen	Qwen3.6 Max Preview	262,144	≤128k tokens	$1.3	$7.8	$1.625	$0.13	-	-	-	-	-	-	-
Qwen	Qwen3.6 Max Preview	262,144	128k–256k tokens	$2	$12	$2.5	$0.2	-	-	-	-	-	-	-
DeepSeek	DeepSeek V3.2	163,000	-	$0.287	$0.216	$0.058	$0.144	$0.359	$0.029	$0.287	$0.431	$0.431	-	-
Z.ai	GLM 4.7	202,752	-	$0.72	$2.64	$0.132	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.5	202,752	-	$0.72	$2.64	$0.132	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.6	202,752	-	$0.72	$2.64	$0.132	-	-	-	-	-	-	-	FP4
MiniMax	MiniMax M2.7	204,800	-	$0.36	$1.44	$0.45	$0.072	-	-	-	-	-	-	FP4
NVIDIA	NVIDIA Nemotron 3 Super 120B A12B	1,000,000	-	$0.345	$1.035	-	-	-	-	-	-	-	-	FP8
MiniMax	MiniMax M2.5	196,608	-	$0.304	$1.213	$0.061	-	-	-	-	-	-	-	-
NVIDIA	NVIDIA Nemotron 3 Nano 30B A3B	128,000	-	$0.04	$0.16	-	-	-	-	-	-	-	-	BF16
Moonshot AI	Kimi K2 Thinking	262,144	-	$0.574	$2.294	$0.115	-	-	-	-	-	-	-	FP4
Qwen	Qwen3 Next 80B A3B Thinking	262,144	-	$0.15	$1.2	-	-	-	-	-	-	-	-	-
Moonshot AI	Kimi K2 Instruct	128,000	-	$0.574	$2.294	$0.115	-	-	-	-	-	-	-	-
DeepSeek	DeepSeek R1 0528	163,840	-	$0.574	$2.294	-	-	-	-	-	-	-	-	-
Qwen	Qwen3 Max	256,000	≤32k tokens	$1.2	$6	-	-	-	-	-	-	-	-	-
			32k–128k tokens	$2.4	$12	-	-	-	-	-	-	-	-
			128k–256k tokens	$3	$15	-	-	-	-	-	-	-	-
Microsoft	Fara 7B	128,000	-	$0.168	$0.2	-	-	-	-	-	-	-	-	-
DeepSeek	DeepSeek R1 Distill Llama 70B	131,100	-	$0.56	$0.64	-	-	-	-	-	-	-	-	-
DeepSeek	DeepSeek V3	128,000	-	$0.287	$1.147	$0.057	$0.143	-	-	-	-	$0.573	-	-
MistralAI	Mistral 7B Instruct v0.3	32,768	-	$0.088	$0.152	-	-	-	-	-	-	-	-	-
Meta	Llama 3.3 70B Instruct	128,000	-	$0.096	$0.304	-	-	-	-	-	-	-	-	-
OpenAI	GPT OSS 120B	256,000	-	$0.12	$0.48	$0.012	-	-	-	-	-	-	-	-
Qwen	Qwen Plus	1,000,000	≤256k tokens	$0.4	$1.2	$0.4	-	-	-	-	-	-	-	-
Qwen	Qwen Plus	1,000,000	256k–1M tokens	$1.2	$3.6	$1.2	-	-	-	-	-	-	-	-

Code models

Hover any row to see additional billing modes (thinking, cache, batch, etc.).

Organization	Model name	Context length	Pricing tier	Input (per 1M tokens)	Output (per 1M tokens)	Input (Thinking) (per 1M tokens)	Input (Implicit Cache) (per 1M tokens)	Input (Batch File) (per 1M tokens)	Explicit Cache Creation (per 1M tokens)	Explicit Cache Read (per 1M tokens)	Input (Batch Chat) (per 1M tokens)	Output (Batch File) (per 1M tokens)	Output (Batch Chat) (per 1M tokens)	Quantization
Qwen	Qwen3 Coder Plus	262,000	≤32k tokens	$1	$5	$0.2	$1.25	$0.1	-	-	-	-	-	-
			32k–128k tokens	$1.8	$9	$0.36	$2.25	$0.18	-	-	-	-	-
			128k–256k tokens	$3	$15	$0.6	$3.75	$0.3	-	-	-	-	-
			256k–1M tokens	$6	$60	$1.2	$7.5	$0.6	-	-	-	-	-
Qwen	Qwen3 Coder Next	262,144	≤32k tokens	$0.3	$1.5	-	-	-	-	-	-	-	-	-
			32k–128k tokens	$0.5	$2.5	-	-	-	-	-	-	-	-
			128k–256k tokens	$0.8	$4	-	-	-	-	-	-	-	-
Qwen	Qwen3 Coder 30B A3B Instruct	262,114	≤32k tokens	$0.45	$2.25	-	-	-	-	-	-	-	-	-
			32k–128k tokens	$0.75	$3.75	-	-	-	-	-	-	-	-
			128k–256k tokens	$1.2	$6	-	-	-	-	-	-	-	-
			256k–1M tokens	$2.4	$14.4	-	-	-	-	-	-	-	-
Qwen	Qwen3 Coder 480B A35B Instruct	262,114	≤32k tokens	$1.5	$7.5	-	-	-	-	-	-	-	-	-
			32k–128k tokens	$2.7	$13.5	-	-	-	-	-	-	-	-
			128k–256k tokens	$4.5	$22.5	-	-	-	-	-	-	-	-
			256k–1M tokens	$9	$90	-	-	-	-	-	-	-	-
Qwen	Qwen3 Coder Flash	262,114	≤32k tokens	$0.3	$1.5	$0.06	$0.375	$0.03	-	-	-	-	-	-
			32k–128k tokens	$0.5	$2.5	$0.1	$0.625	$0.05	-	-	-	-	-
			128k–256k tokens	$0.8	$4	$0.16	$1	$0.08	-	-	-	-	-
			256k–1M tokens	$1.6	$9.6	$0.32	$2	$0.16	-	-	-	-	-
Moonshot AI	Kimi K2.7 Code	256,000	-	$1.14	$4.8	$0.228	-	-	-	-	-	-	-	-

Vision models

Hover any row to see additional billing modes (thinking, cache, batch, etc.).

Organization	Model name	Context length	Pricing tier	Input (per 1M tokens)	Output (per 1M tokens)	Input (Thinking) (per 1M tokens)	Input (Implicit Cache) (per 1M tokens)	Input (Batch File) (per 1M tokens)	Explicit Cache Creation (per 1M tokens)	Explicit Cache Read (per 1M tokens)	Input (Batch Chat) (per 1M tokens)	Output (Batch File) (per 1M tokens)	Output (Batch Chat) (per 1M tokens)	Quantization
MiniMax	MiniMax M3	1,000,000	≤512k tokens	$0.36	$1.44	$0.072	-	-	-	-	-	-	-	-
MiniMax	MiniMax M3	1,000,000	512k–1M tokens	$0.72	$2.88	$0.144	-	-	-	-	-	-	-	-
Qwen	Qwen3.7 Plus	256,000	≤256k tokens	$0.4	$1.6	$0.08	$0.5	$0.04	-	-	-	-	-	-
Qwen	Qwen3.7 Plus	256,000	256k–1M tokens	$1.2	$4.8	$0.24	$1.5	$0.12	-	-	-	-	-	-
Qwen	Qwen3.6 Plus	256,000	≤256k tokens	$0.5	$3	$0.625	$0.05	-	-	-	-	-	-	-
Qwen	Qwen3.6 Plus	256,000	256k–1M tokens	$2	$6	$2.5	$0.2	-	-	-	-	-	-	-
Moonshot AI	Kimi K2.5	256,000	-	$0.574	$3.011	$0.115	$0.718	$0.057	-	-	-	-	-	-
Moonshot AI	Kimi K2.6	256,000	-	$0.8939	$3.7131	$0.1788	$1.1174	$0.0894	-	-	-	-	-	-
Qwen	Qwen3.6 27B	256,000	-	$0.6	$3.6	-	-	-	-	-	-	-	-	-
Qwen	Qwen3.6 35B A3B	256,000	-	$0.375	$2.25	-	-	-	-	-	-	-	-	-
Qwen	Qwen3.5 122B A10B	256,000	-	$0.4	$3.2	-	-	-	-	-	-	-	-	-
Qwen	Qwen3.5 27B	256,000	-	$0.3	$2.4	-	-	-	-	-	-	-	-	-
Qwen	Qwen3.5 35B A3B	256,000	-	$0.25	$2	-	-	-	-	-	-	-	-	-
Qwen	Qwen3.5 397B A17B	256,000	-	$0.6	$3.6	-	-	-	-	-	-	-	-	-
Qwen	Qwen3.5 Flash	1,000,000	-	$0.1	$0.4	$0.125	$0.01	-	-	-	-	-	-	-
Qwen	Qwen3.5 Plus	1,000,000	≤256k tokens	$0.4	$2.4	$0.5	$0.04	-	-	-	-	-	-	-
Qwen	Qwen3.5 Plus	1,000,000	256k–1M tokens	$0.5	$3	$0.625	$0.05	-	-	-	-	-	-	-
Qwen	Qwen3 VL 235B A22B Thinking	256,000	-	$0.4	$4	-	-	-	-	-	-	-	-	-
Qwen	Qwen3 VL 235B A22B Instruct	256,000	-	$0.4	$1.6	-	-	-	-	-	-	-	-	-
Qwen	Qwen3 VL 30B A3B Instruct	256,000	-	$0.104	$0.104	-	-	-	-	-	-	-	-	-
Qwen	Qwen3 VL 8B Instruct	256,000	-	$0.064	$0.064	-	-	-	-	-	-	-	-	-
Qwen	Qwen3 VL Flash	256,000	≤32k tokens	$0.05	-	$0.01	$0.025	$0.0625	$0.005	$0.4	$0.2	-	-	-
			32k–128k tokens	$0.075	-	$0.015	$0.038	$0.09375	$0.0075	$0.6	$0.3	-	-
			128k–256k tokens	$0.12	-	$0.024	$0.06	$0.15	$0.012	$0.96	$0.48	-	-
Qwen	Qwen3 VL Plus	256,000	≤32k tokens	$0.2	$1.6	-	-	-	-	-	-	-	-	-
			32k–128k tokens	$0.3	$2.4	-	-	-	-	-	-	-	-
			128k–256k tokens	$0.6	$4.8	-	-	-	-	-	-	-	-
Z.ai	GLM 4.5V	202,752	-	$0.72	$2.16	$0.132	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.6V	202,752	-	$0.36	$1.08	$0.06	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.6V Flash	128,000	-	$0	$0	$0	-	-	-	-	-	-	-	FP4
Z.ai	GLM 4.6V FlashX	128,000	-	$0.048	$0.48	$0.0048	-	-	-	-	-	-	-	FP4
Z.ai	GLM 5V Turbo	202,752	-	$1.44	$4.8	$0.288	-	-	-	-	-	-	-	FP4

OCR models

Hover any row to see additional billing modes (thinking, cache, batch, etc.).

Organization	Model name	API model string	Context length	Pricing tier	Input (per 1M tokens)	Output (per 1M tokens)	Input (Thinking) (per 1M tokens)	Input (Implicit Cache) (per 1M tokens)	Input (Batch File) (per 1M tokens)	Explicit Cache Creation (per 1M tokens)	Explicit Cache Read (per 1M tokens)	Input (Batch Chat) (per 1M tokens)	Output (Batch File) (per 1M tokens)	Output (Batch Chat) (per 1M tokens)	Quantization
Tencent Hunyuan	HunyuanOCR		16,000	-	$0.168	$0.28	-	-	-	-	-	-	-	-	-

PreviousManage API Keys NextHugging Face