
Moonshot AI · Chat / LLM · 1T Parameters (32B Active) · 256K Context

Streaming Reasoning Chain-of-Thought Agentic Coding Tool Orchestration Long ContextOverview
Kimi K2 Thinking is the flagship open-weights reasoning model from Moonshot AI — a Chinese AI research company focused on building large-scale foundation models with advanced agentic capabilities. It is the first open-source model to outperform leading closed-source models including GPT-5 and Claude 4.5 Sonnet across major benchmarks — HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%). Built on a 1T parameter sparse MoE architecture with 32B active per token and native INT4 quantization via QAT, it runs at 2x the speed of FP8 deployments. The model maintains stable tool-use across 200–300 sequential calls within a 256K context window, with interleaved chain-of-thought and dynamic tool calling for complex agentic workflows. Served instantly via the Qubrid AI Serverless API.🏆 First open-source to beat GPT-5 and Claude 4.5 Sonnet. 1T MoE. 2x FP8 speed. Deploy on Qubrid AI — no 512GB RAM cluster required.
Model Specifications
| Field | Details |
|---|---|
| Model ID | moonshotai/Kimi-K2-Thinking |
| Provider | Moonshot AI |
| Kind | Chat / LLM |
| Architecture | Sparse MoE Transformer — 1T total / 32B active per token, 61 layers (1 dense), 384 experts (8 selected per token), MLA attention, SwiGLU |
| Parameters | 1T total (32B active per forward pass) |
| Context Length | 256,000 Tokens |
| MoE | No |
| Release Date | November 2025 |
| License | Modified MIT License |
| Training Data | Large-scale diverse dataset with agentic reasoning trajectories; INT4 Quantization-Aware Training (QAT) in post-training |
| Function Calling | Not Supported |
| Image Support | N/A |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
| State | 🟢 Ready |
Pricing
💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|---|
| Input Tokens | $0.60 |
| Input Tokens (Cached) | $0.30 |
| Output Tokens | $2.50 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEYin the code below with your actual key
⚠️ Temperature note: Always use temperature=1.0 for Kimi K2 Thinking — this is the recommended setting for all tasks and benchmark-consistent performance.
Python
JavaScript
Go
cURL
Live Example
Prompt: What are the benefits of renewable energy?
Response:
Playground Features
The Qubrid AI Playground lets you interact with Kimi K2 Thinking directly in your browser — no setup, no code, no cost to explore.🧠 System Prompt
Define the model’s reasoning depth, role, and tool-use constraints before the conversation begins — essential for long-horizon agentic research workflows and multi-step coding sessions.Set your system prompt once in the Qubrid Playground and it applies across every turn — including stable reasoning state across extended multi-step sessions.
🎯 Few-Shot Examples
Guide the model’s reasoning style and output format with concrete examples — no fine-tuning, no retraining required.| User Input | Assistant Response |
|---|---|
Find all bugs in this Python function and fix them | Bug 1 (line 4): Off-by-one error — range(len(arr)) should be range(len(arr)-1). Bug 2 (line 7): Division by zero not handled — add: if denominator == 0: return None. Fixed function: [corrected code] |
Prove that log₂(3) is irrational | Assume log₂(3) = p/q (rational, lowest terms). Then 2^(p/q) = 3 → 2^p = 3^q. Left side is even, right side is odd. Contradiction. Therefore log₂(3) is irrational. ∎ |
💡 Stack multiple few-shot examples in the Qubrid Playground to establish reasoning format and output structure — no fine-tuning required.
Inference Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output |
| Temperature | number | 1 | Recommended temperature is 1.0 for Kimi K2 Thinking |
| Max Tokens | number | 16384 | Maximum number of tokens to generate |
| Top P | number | 0.95 | Controls nucleus sampling |
Use Cases
- Complex agentic research workflows
- Long-horizon coding and debugging
- Advanced mathematical reasoning
- Multi-step tool orchestration
- Autonomous writing and analysis
- Scientific reasoning tasks
Strengths & Limitations
| Strengths | Limitations |
|---|---|
| First open-source model to beat GPT-5 and Claude 4.5 Sonnet on open benchmarks | Requires 512GB+ RAM for full self-hosted deployment |
| 1T MoE with only 32B active per token — frontier reasoning at efficiency | ~600GB model size — large infrastructure needed for self-hosting |
| Native INT4 via QAT — 2x speed vs FP8 with no accuracy loss | Thinking mode means higher latency than non-reasoning models |
| Interleaved chain-of-thought with dynamic tool calling | Temperature must be set to 1.0 for recommended performance |
| Stable across 200–300 sequential tool calls | Function calling not supported via API |
| 256K context window for long-horizon agentic sessions |
Why Qubrid AI?
- 🚀 No infrastructure setup — 1T MoE served serverlessly, pay only for what you use
- 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
- 💰 Cached input pricing — $0.30/1M for cached tokens, critical for long agentic sessions with repeated context
- 🧠 Frontier reasoning on demand — access the first open-source model to beat GPT-5 without managing a 600GB deployment
- 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
- 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
Resources
| Resource | Link |
|---|---|
| 📖 Qubrid Docs | docs.platform.qubrid.com |
| 🎮 Playground | Try Kimi K2 Thinking live |
| 🔑 API Keys | Get your API Key |
| 🤗 Hugging Face | moonshotai/Kimi-K2-Thinking |
| 💬 Discord | Join the Qubrid Community |
Built with ❤️ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.
Frontier models. Serverless infrastructure. Zero friction.