Skip to main content
Qubrid AI
OpenAI · Chat / LLM · 20.9B Parameters · 131K ContextQubrid Playground License HuggingFaceFunction Calling Tool Calling Streaming Reasoning Agent Workflows Code

Overview

gpt-oss-20b is part of OpenAI’s open-weight gpt-oss series — purpose-built for powerful reasoning, agentic tasks, and versatile developer use cases. At ~21B parameters with a compact Mixture-of-Experts (MoE) architecture, it activates only 3.6B parameters during inference, making it exceptionally fast and efficient for local deployments, low-latency pipelines, and single-GPU setups. With configurable reasoning depth and native function calling support, gpt-oss-20b punches well above its weight class.
Single B200 GPU deployment — production-grade intelligence without the infrastructure overhead. Deploy via Qubrid AI in minutes.

Model Specifications

FieldDetails
Model IDopenai/gpt-oss-20b
ProviderOpenAI
KindChat / LLM
ArchitectureCompact MoE with SwiGLU activations, Token-choice MoE, Alternating attention mechanism
Model Size20.9B Params (~3.6B active during inference)
Context Length131,072 Tokens
MoENo
Release DateAugust 2024
LicenseApache 2.0
Training DataComprehensive safety evaluation and testing protocols, global community feedback integration
Function CallingSupported
Serverless APIAvailable
Fine-tuningComing Soon
On-demandComing Soon

Pricing

💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
Token TypePrice per 1M Tokens
Input Tokens$0.05
Output Tokens$0.28

Quickstart

Prerequisites

  1. Create a free account at platform.qubrid.com
  2. Generate your API key from the API Keys section
  3. Replace QUBRID_API_KEY in the code below with your actual key

Python

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="openai/gpt-oss-20b",
  messages=[
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  max_tokens=4096,
  temperature=0.7,
  top_p=1,
  stream=True
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

JavaScript

import OpenAI from "openai";

// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
  baseURL: "https://platform.qubrid.com/v1",
  apiKey: "QUBRID_API_KEY",
});

// Create a streaming chat completion
const stream = await client.chat.completions.create({
  model: "openai/gpt-oss-20b",
  messages: [
    {
      role: "user",
      content: "Explain quantum computing in simple terms",
    },
  ],
  max_tokens: 4096,
  temperature: 0.7,
  top_p: 1,
  stream: true,
});

// If stream = false comment this out
for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
console.log("\n");

// If stream = true comment this out
console.log(stream.choices[0].message.content);

Go

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
)

func main() {
	url := "https://platform.qubrid.com/v1/chat/completions"

	data := map[string]interface{}{
		"model": "openai/gpt-oss-20b",
		"messages": []map[string]string{
			{
				"role":    "user",
				"content": "Explain quantum computing in simple terms",
			},
		},
		"temperature": 0.7,
		"max_tokens":  4096,
		"stream":      true,
		"top_p":       1,
	}

	jsonData, _ := json.Marshal(data)
	req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
	req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
	req.Header.Set("Content-Type", "application/json")

	client := &http.Client{}
	res, _ := client.Do(req)
	defer res.Body.Close()

	scanner := bufio.NewScanner(res.Body)
	for scanner.Scan() {
		line := scanner.Text()
		if line != "" {
			fmt.Println(line)
		}
	}
}

cURL

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "openai/gpt-oss-20b",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4096,
  "stream": true,
  "top_p": 1
}'

Live Example

Prompt: Explain quantum computing in simple terms
Response:
Quantum Computing — Plain and Simple

Classical computers store information as bits: each bit is either 0 or 1.

Quantum computers use qubits, which can be 0, 1, or both simultaneously
(called superposition) — like a coin mid-spin before it lands.

This lets quantum computers evaluate many possible solutions in parallel,
making them dramatically faster for certain problem types:

  - Cryptography and security
  - Drug discovery and molecular simulation
  - Financial optimization and risk modeling
  - Complex logistics and scheduling

Two key principles at work:
  1. Superposition  — qubits exist in multiple states at once
  2. Entanglement   — qubits can be linked so the state of one
                      instantly influences another, enabling
                      massively coordinated computation

The bottom line: classical computers handle most tasks well.
Quantum computers are a specialized tool for problems that would
take even the fastest supercomputers millions of years to crack.
Try it yourself in the Qubrid AI Playground →

Playground Features

The Qubrid AI Playground lets you experiment with gpt-oss-20b directly in your browser — no code, no setup, no cost to explore.

🧠 System Prompt

Define the model’s persona, constraints, and behavior before the conversation begins — ideal for role-specific assistants, domain-locked bots, or output format control.
Example: "You are a concise technical assistant. Always respond in bullet points,
limit answers to 5 bullets max, and never use jargon without explaining it."
Set your system prompt once in the Qubrid Playground and it persists across the entire conversation.

🎯 Few-Shot Examples

Show the model exactly what good looks like — before your real query. No fine-tuning, no retraining. Just examples.
User InputAssistant Response
Write a function to reverse a string in Pythondef reverse_string(s: str) -> str: return s[::-1]
Explain what an API isAn API (Application Programming Interface) is a contract between two software systems that defines how they communicate — what requests are valid and what responses to expect.
💡 Stack multiple few-shot examples in the Qubrid Playground to progressively refine tone, format, and domain focus — no fine-tuning required.

Inference Parameters

ParameterTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output
Temperaturenumber0.7Controls randomness. Higher values mean more creative but less predictable output
Max Tokensnumber4096Maximum number of tokens to generate in the response
Top Pnumber1Nucleus sampling: considers tokens with top_p probability mass

Use Cases

  1. Function calling with schemas
  2. Web browsing and browser automation
  3. Agentic tasks
  4. Chain-of-thought reasoning
  5. Local and low-latency deployments
  6. Rapid prototyping and development support
  7. Code generation and optimization
  8. Customer support automation
  9. Content generation and editing
  10. Process automation and workflow optimization

Strengths & Limitations

StrengthsLimitations
Compact MoE design with SwiGLU activations for efficient inferenceSmaller capacity than largest frontier models
Token-choice MoE optimized for single-GPU efficiencyMay require fine-tuning for highly specialized domains
Native FP4 quantization for optimal inference speedMoE architecture adds some complexity to self-hosted setups
Single B200 GPU deployment capability
131K context window with efficient memory usage
Adjustable reasoning effort levels for task-specific optimization
Supports function calling with defined schemas
Apache 2.0 license for commercial use

Why Qubrid AI?

  • 🚀 No infrastructure setup — serverless API, pay only for what you use
  • 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
  • Low-latency by design — gpt-oss-20b is optimized for speed; Qubrid’s serverless layer keeps it that way
  • 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
  • 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
  • 🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box

Resources

ResourceLink
📖 Qubrid Docsdocs.platform.qubrid.com
🎮 PlaygroundTry gpt-oss-20b live
🔑 API KeysGet your API Key
🤗 Hugging Faceopenai/gpt-oss-20b
💬 DiscordJoin the Qubrid Community

Built with ❤️ by Qubrid AI

Frontier models. Serverless infrastructure. Zero friction.