
OpenAI · Chat / LLM · 20.9B Parameters · 131K Context

Function Calling Tool Calling Streaming Reasoning Agent Workflows CodeOverview
gpt-oss-20b is part of OpenAI’s open-weight gpt-oss series — purpose-built for powerful reasoning, agentic tasks, and versatile developer use cases. At ~21B parameters with a compact Mixture-of-Experts (MoE) architecture, it activates only 3.6B parameters during inference, making it exceptionally fast and efficient for local deployments, low-latency pipelines, and single-GPU setups. With configurable reasoning depth and native function calling support, gpt-oss-20b punches well above its weight class.⚡ Single B200 GPU deployment — production-grade intelligence without the infrastructure overhead. Deploy via Qubrid AI in minutes.
Model Specifications
| Field | Details |
|---|---|
| Model ID | openai/gpt-oss-20b |
| Provider | OpenAI |
| Kind | Chat / LLM |
| Architecture | Compact MoE with SwiGLU activations, Token-choice MoE, Alternating attention mechanism |
| Model Size | 20.9B Params (~3.6B active during inference) |
| Context Length | 131,072 Tokens |
| MoE | No |
| Release Date | August 2024 |
| License | Apache 2.0 |
| Training Data | Comprehensive safety evaluation and testing protocols, global community feedback integration |
| Function Calling | Supported |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
Pricing
💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|---|
| Input Tokens | $0.05 |
| Output Tokens | $0.28 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEYin the code below with your actual key
Python
JavaScript
Go
cURL
Live Example
Prompt: Explain quantum computing in simple terms
Response:
Playground Features
The Qubrid AI Playground lets you experiment with gpt-oss-20b directly in your browser — no code, no setup, no cost to explore.🧠 System Prompt
Define the model’s persona, constraints, and behavior before the conversation begins — ideal for role-specific assistants, domain-locked bots, or output format control.Set your system prompt once in the Qubrid Playground and it persists across the entire conversation.
🎯 Few-Shot Examples
Show the model exactly what good looks like — before your real query. No fine-tuning, no retraining. Just examples.| User Input | Assistant Response |
|---|---|
Write a function to reverse a string in Python | def reverse_string(s: str) -> str: return s[::-1] |
Explain what an API is | An API (Application Programming Interface) is a contract between two software systems that defines how they communicate — what requests are valid and what responses to expect. |
💡 Stack multiple few-shot examples in the Qubrid Playground to progressively refine tone, format, and domain focus — no fine-tuning required.
Inference Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output |
| Temperature | number | 0.7 | Controls randomness. Higher values mean more creative but less predictable output |
| Max Tokens | number | 4096 | Maximum number of tokens to generate in the response |
| Top P | number | 1 | Nucleus sampling: considers tokens with top_p probability mass |
Use Cases
- Function calling with schemas
- Web browsing and browser automation
- Agentic tasks
- Chain-of-thought reasoning
- Local and low-latency deployments
- Rapid prototyping and development support
- Code generation and optimization
- Customer support automation
- Content generation and editing
- Process automation and workflow optimization
Strengths & Limitations
| Strengths | Limitations |
|---|---|
| Compact MoE design with SwiGLU activations for efficient inference | Smaller capacity than largest frontier models |
| Token-choice MoE optimized for single-GPU efficiency | May require fine-tuning for highly specialized domains |
| Native FP4 quantization for optimal inference speed | MoE architecture adds some complexity to self-hosted setups |
| Single B200 GPU deployment capability | |
| 131K context window with efficient memory usage | |
| Adjustable reasoning effort levels for task-specific optimization | |
| Supports function calling with defined schemas | |
| Apache 2.0 license for commercial use |
Why Qubrid AI?
- 🚀 No infrastructure setup — serverless API, pay only for what you use
- 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
- ⚡ Low-latency by design — gpt-oss-20b is optimized for speed; Qubrid’s serverless layer keeps it that way
- 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
- 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
- 🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box
Resources
| Resource | Link |
|---|---|
| 📖 Qubrid Docs | docs.platform.qubrid.com |
| 🎮 Playground | Try gpt-oss-20b live |
| 🔑 API Keys | Get your API Key |
| 🤗 Hugging Face | openai/gpt-oss-20b |
| 💬 Discord | Join the Qubrid Community |
Built with ❤️ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.
Frontier models. Serverless infrastructure. Zero friction.