Documentation Index
Fetch the complete documentation index at: https://docs.platform.qubrid.com/llms.txt
Use this file to discover all available pages before exploring further.
Alibaba Cloud · Chat / LLM · 80B Parameters (3.9B Active) · 256K Context

Streaming Reasoning Chain-of-Thought Long Context Code Agentic Planning
Overview
Qwen3 Next 80B A3B Thinking is a next-generation foundation model from Alibaba’s Qwen team featuring a revolutionary Hybrid Attention mechanism (Gated DeltaNet + Gated Attention) with High-Sparsity MoE architecture. With 80B total parameters and only 3.9B active per token, it delivers 10x higher throughput than Qwen3-32B on long contexts while outperforming Gemini-2.5-Flash-Thinking on multiple benchmarks. Designed exclusively for deep reasoning tasks, it operates in thinking-only mode — surfacing full chain-of-thought traces before every response. Served instantly via the Qubrid AI Serverless API.
🧠 10x throughput vs Qwen3-32B. Outperforms Gemini-2.5-Flash-Thinking. 3.9B active parameters.
Deploy on Qubrid AI — no infrastructure required.
Model Specifications
| Field | Details |
|---|
| Model ID | Qwen/Qwen3-Next-80B-A3B-Thinking |
| Provider | Alibaba Cloud (Qwen Team) |
| Kind | Chat / LLM |
| Architecture | Hybrid Transformer-Mamba (Gated DeltaNet + Gated Attention) with High-Sparsity MoE and Multi-Token Prediction (MTP) |
| Parameters | 80B total (3.9B active per token) |
| Context Length | 256,000 Tokens |
| MoE | No |
| Release Date | September 2025 |
| License | Apache 2.0 |
| Training Data | Large-scale multilingual pretraining dataset, fine-tuned with GSPO for thinking |
| Function Calling | Not Supported |
| Image Support | N/A |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
| State | 🟢 Ready |
Pricing
💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|
| Input Tokens | $0.20 |
| Output Tokens | $1.80 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEY in the code below with your actual key
💡 Thinking mode: This model always produces chain-of-thought reasoning traces before its final answer. Plan for higher output token counts on complex tasks — use max_tokens accordingly.
Python
from openai import OpenAI
# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
base_url="https://platform.qubrid.com/v1",
api_key="QUBRID_API_KEY",
)
# Create a streaming chat completion
stream = client.chat.completions.create(
model="Qwen/Qwen3-Next-80B-A3B-Thinking",
messages=[
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
max_tokens=8192,
temperature=0.6,
top_p=0.95,
stream=True
)
# If stream = False comment this out
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")
# If stream = True comment this out
print(stream.choices[0].message.content)
JavaScript
import OpenAI from "openai";
// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
baseURL: "https://platform.qubrid.com/v1",
apiKey: "QUBRID_API_KEY",
});
// Create a streaming chat completion
const stream = await client.chat.completions.create({
model: "Qwen/Qwen3-Next-80B-A3B-Thinking",
messages: [
{
role: "user",
content: "Explain quantum computing in simple terms",
},
],
max_tokens: 8192,
temperature: 0.6,
top_p: 0.95,
stream: true,
});
// If stream = false comment this out
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
console.log("\n");
// If stream = true comment this out
console.log(stream.choices[0].message.content);
package main
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
)
func main() {
url := "https://platform.qubrid.com/v1/chat/completions"
data := map[string]interface{}{
"model": "Qwen/Qwen3-Next-80B-A3B-Thinking",
"messages": []map[string]string{
{
"role": "user",
"content": "Explain quantum computing in simple terms",
},
},
"temperature": 0.6,
"max_tokens": 8192,
"stream": true,
"top_p": 0.95,
}
jsonData, _ := json.Marshal(data)
req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
res, _ := client.Do(req)
defer res.Body.Close()
scanner := bufio.NewScanner(res.Body)
for scanner.Scan() {
line := scanner.Text()
if line != "" {
fmt.Println(line)
}
}
}
cURL
curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
-H "Authorization: Bearer QUBRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-Next-80B-A3B-Thinking",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
"temperature": 0.6,
"max_tokens": 8192,
"stream": true,
"top_p": 0.95
}'
Live Example
Prompt: Prove that the square root of 2 is irrational
Response:
<think>
This is a classic proof by contradiction. I'll use the standard approach:
assume √2 is rational, express it as a fraction in lowest terms, and
derive a contradiction by showing both numerator and denominator must
be even — contradicting the assumption that the fraction is in lowest terms.
</think>
Proof: √2 is irrational
Assume for contradiction that √2 is rational.
Then √2 = p/q where p, q are integers with no common factors (lowest terms).
Step 1: Square both sides
2 = p²/q²
→ p² = 2q²
Step 2: p² is even, therefore p is even
Since p² = 2q², p² is divisible by 2.
If p² is even, then p must be even (odd × odd = odd, so odd² is always odd).
Let p = 2k for some integer k.
Step 3: Substitute back
(2k)² = 2q²
4k² = 2q²
q² = 2k²
Step 4: q² is even, therefore q is even
By the same logic as Step 2, q must also be even.
Step 5: Contradiction
Both p and q are even — meaning they share a common factor of 2.
This contradicts our assumption that p/q is in lowest terms.
Conclusion:
Our assumption that √2 is rational leads to a contradiction.
Therefore √2 is irrational. ∎
Try it yourself in the Qubrid AI Playground →
Playground Features
The Qubrid AI Playground lets you interact with Qwen3 Next 80B Thinking directly in your browser — no setup, no code, no cost to explore.
🧠 System Prompt
Define the model’s reasoning approach, domain focus, and output format before the conversation begins. Particularly powerful for mathematical proofs, agentic planning, and structured analytical tasks.
Example: "You are an expert mathematical reasoning assistant. Work through
every problem step by step, state all assumptions explicitly, and present
your final answer in a clearly marked conclusion block."
Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.
🎯 Few-Shot Examples
Guide the model’s reasoning style and output structure with concrete examples — no fine-tuning, no retraining required.
| User Input | Assistant Response |
|---|
Is 97 a prime number? | Yes. 97 is prime. Check divisibility by all primes ≤ √97 ≈ 9.8: not divisible by 2, 3, 5, or 7. Therefore 97 has no factors other than 1 and itself. |
Write a Python function to check if a string is a palindrome | def is_palindrome(s: str) -> bool: s = s.lower().replace(" ", "") return s == s[::-1] |
💡 Stack multiple few-shot examples in the Qubrid Playground to establish your preferred reasoning format and output structure — no fine-tuning required.
Inference Parameters
| Parameter | Type | Default | Description |
|---|
| Streaming | boolean | true | Enable streaming responses for real-time output |
| Temperature | number | 0.6 | Controls randomness. Lower values recommended for reasoning tasks |
| Max Tokens | number | 8192 | Maximum number of tokens to generate |
| Top P | number | 0.95 | Nucleus sampling parameter |
Use Cases
- Complex multi-step reasoning
- Mathematical proofs
- Code synthesis
- Logical analysis
- Agentic planning
- Long-context document analysis
Strengths & Limitations
| Strengths | Limitations |
|---|
| Hybrid Attention (Gated DeltaNet + Gated Attention) for efficient long-context processing | Thinking mode only — no fast non-thinking mode available |
| 10x throughput vs Qwen3-32B on 32K+ contexts | Longer thinking traces increase latency on complex tasks |
| Only 3.9B active parameters from 80B total (efficient inference) | New architecture with limited community tooling support |
| Native 256K context window | Function calling not supported |
| Outperforms Gemini-2.5-Flash-Thinking on multiple benchmarks | |
| Apache 2.0 — fully open-source with commercial use | |
Why Qubrid AI?
- 🚀 No infrastructure setup — 80B MoE served serverlessly, pay only for what you use
- 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
- 🧠 Reasoning at scale — Qwen3 Next’s 10x throughput advantage is fully realized on Qubrid’s low-latency infrastructure
- 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
- 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
- 🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box
Resources
Built with ❤️ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.