Skip to main content

About the Provider

OpenAI is the organization behind GPT OSS 120B. They are a major AI research lab and platform provider known for creating influential generative AI models (like the GPT series). With GPT-OSS, OpenAI is extending its technology into the open-source ecosystem, empowering developers and enterprises to run powerful language models without proprietary restrictions.

Model Quickstart

This section helps you quickly get started with the openai/gpt-oss-120b model on the Qubrid AI inferencing platform. To use this model, you need:
  • A valid Qubrid API key
  • Access to the Qubrid inference API
  • Basic knowledge of making API requests in your preferred language
Once authenticated with your API key, you can send inference requests to the openai/gpt-oss-120b model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.
import requests
import json
from pprint import pprint

url = "https://platform.qubrid.com/api/v1/qubridai/chat/completions"
headers = {
"Authorization": "Bearer <QUBRID_API_KEY>",
"Content-Type": "application/json"
}

data = {
"model": "openai/gpt-oss-120b",
"messages": [
{
  "role": "user",
  "content": "Explain quantum computing to a 5 year old."
}
],
"temperature": 0.7,
"max_tokens": 4096,
"stream": False,
"top_p": 0.8
}

response = requests.post(
  url,
  headers=headers,
  json=data, 
)
content_type = response.headers.get("Content-Type", "")

if "application/json" in content_type:
  pprint(response.json())

else:
  for line in response.iter_lines(decode_unicode=True):
      if not line:
          continue

      if line.startswith("data:"):
          payload = line.replace("data:", "").strip()

          if payload == "[DONE]":
              break

          try:
              chunk = json.loads(payload)
              pprint(chunk)
          except json.JSONDecodeError:
              print("Raw chunk:", payload)
This will produce the response below:
{
    "content": "**Imagine a magical playground!**\n\n1. **Normal computers are like a line of tiny LEGO blocks.**  \n   Each block can be either a **red** brick (we call that a 0) **or** a **blue** brick (that’s a 1). By stacking lots of red and blue bricks in different ways, the computer can solve problems.\n\n2. **A quantum computer uses *magic* LEGO blocks.**  \n   These magic blocks can be **both red *and* blue at the same time!** It’s like a block that’s half‑red, half‑blue, and can change to whichever color you need later. This special trick is called **super‑position**.\n\n3. **Even cooler: the magic blocks can be best friends.**  \n   When two blocks become best friends, whatever one does, the other knows instantly, even if they’re far apart. If one block decides to be red, the other will be blue right away. This friendship is called **entanglement**.\n\n4. **Why is this fun?**  \n   Because with many magic blocks that can be red‑and‑blue together and be best friends, the playground can try **many different ways to build a tower all at once**. That means it can solve some puzzles much faster than the regular LEGO line.\n\n5. **In short:**  \n   - Regular computer = lots of ordinary blocks (0 or 1).  \n   - Quantum computer = lots of magical blocks (0 *and* 1 together) that can talk instantly.  \n   - The magic lets it try many solutions at the same time, making some jobs super quick.\n\nSo a quantum computer is like a super‑magical playground where the blocks can be two colors at once and are best friends that always know each other’s secrets! 🌈🧩✨",
    "metrics": {
        "input_tokens": 79,
        "output_tokens": 264,
        "total_time": 3.3656,
        "tps": 78.4446
    },
    "model": "openai/gpt-oss-120b",
    "usage": {
        "completion_tokens": 444,
        "prompt_tokens": 79,
        "prompt_tokens_details": null,
        "total_tokens": 523
    }
}

Model Overview

GPT OSS 120B is the most powerful open-weight model in the gpt-oss family. It is designed for large-scale reasoning, agentic workflows, and long-context tasks while remaining deployable on a single high-end GPU. The model supports configurable reasoning levels, full chain-of-thought access, tool use, and fine-tuning, making it suitable for advanced inference and customization scenarios.

Model at a Glance

FeatureDetails
Model IDopenai/gpt-oss-120b
Model TypeOpen-weight large language model
ArchitectureLarge-Scale Mixture-of-Experts (MoE) with adaptive routing, SwiGLU activations, hierarchical sparse attention, and token-choice MoE for reasoning efficiency
Context Length256k Tokens
Model Size121.7B Params
Parameters6
Training DataExtensive multi-domain knowledge corpus with safety-aligned fine-tuning, enterprise & community feedback loops, and agentic task simulation datasets

When to use?

Use GPT OSS 120B if you need:
  • Long-context reasoning with very large input and output windows
  • Adjustable reasoning depth based on latency and task complexity
  • Full access to the model’s chain-of-thought for debugging and analysis
  • Agentic capabilities such as function calling, web browsing, and Python execution
  • The ability to fine-tune the model for domain-specific use cases

Inference Parameters

Parameter NameTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output.
Temperaturenumber0.7Controls randomness. Higher values mean more creative but less predictable output.
Max Tokensnumber4096Maximum number of tokens to generate in the response.
Top Pnumber1Nucleus sampling: considers tokens with top_p probability mass.
Reasoning EffortselectmediumControls how much reasoning effort the model should apply.
Reasoning SummaryselectconciseControls the level of explanation in the reasoning summary.

Key Features

  • Configurable Reasoning Effort: Supports low, medium, and high reasoning levels via system prompts
  • Full Chain-of-Thought Access: Provides visibility into the model’s reasoning process for debugging and trust
  • Agentic Capabilities: Built-in support for function calling, web browsing, Python code execution, and structured outputs
  • Fine-Tunable: Can be customized through parameter fine-tuning

Summary

GPT OSS 120B is a high-capacity open-weight language model built for advanced reasoning, long-context tasks, and agentic workflows. With configurable reasoning levels, full chain-of-thought access, tool use, and fine-tuning support, it is well suited for complex inference pipelines. Its Apache 2.0 license and efficient MXFP4 quantization make it accessible for both research and production deployments on modern GPUs.