GPT OSS 120B

About the Provider

OpenAI is the organization behind GPT OSS 120B. They are a major AI research lab and platform provider known for creating influential generative AI models (like the GPT series). With GPT-OSS, OpenAI is extending its technology into the open-source ecosystem, empowering developers and enterprises to run powerful language models without proprietary restrictions.

Model Quickstart

This section helps you quickly get started with the openai/gpt-oss-120b model on the Qubrid AI inferencing platform. To use this model, you need:

A valid Qubrid API key
Access to the Qubrid inference API
Basic knowledge of making API requests in your preferred language

Once authenticated with your API key, you can send inference requests to the openai/gpt-oss-120b model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="openai/gpt-oss-120b",
  messages=[
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  max_tokens=4096,
  temperature=0.7,
  top_p=1,
  stream=True
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

This will produce the response below:

**Quantum Computing in Plain English**

---

### 1. The ordinary computer you already know  
- **Bits** are the tiny switches inside a laptop or phone.  
- Each bit is either **0** *or* **1** (think of a light that's either off or on).  
- By wiring billions of bits together and flipping them in clever patterns, we can do calculations, run games, browse the web, etc.

---

### 2. The "quantum" twist  
A quantum computer uses **qubits** (quantum bits) instead of ordinary bits...

(rest of the output)

Model Overview

GPT OSS 120B is the most powerful open-weight model in the gpt-oss family. It is designed for large-scale reasoning, agentic workflows, and long-context tasks while remaining deployable on a single high-end GPU. The model supports configurable reasoning levels, full chain-of-thought access, tool use, and fine-tuning, making it suitable for advanced inference and customization scenarios.

Model at a Glance

Feature	Details
Model ID	`openai/gpt-oss-120b`
Model Type	Open-weight large language model
Architecture	Large-Scale Mixture-of-Experts (MoE) with adaptive routing, SwiGLU activations, hierarchical sparse attention, and token-choice MoE for reasoning efficiency
Context Length	256k Tokens
Model Size	121.7B Params
Parameters	6
Training Data	Extensive multi-domain knowledge corpus with safety-aligned fine-tuning, enterprise & community feedback loops, and agentic task simulation datasets

When to use?

Use GPT OSS 120B if you need:

Long-context reasoning with very large input and output windows
Adjustable reasoning depth based on latency and task complexity
Full access to the model’s chain-of-thought for debugging and analysis
Agentic capabilities such as function calling, web browsing, and Python execution
The ability to fine-tune the model for domain-specific use cases

Inference Parameters

Parameter Name	Type	Default	Description
Streaming	boolean	true	Enable streaming responses for real-time output.
Temperature	number	0.7	Controls randomness. Higher values mean more creative but less predictable output.
Max Tokens	number	4096	Maximum number of tokens to generate in the response.
Top P	number	1	Nucleus sampling: considers tokens with top_p probability mass.
Reasoning Effort	select	medium	Controls how much reasoning effort the model should apply.
Reasoning Summary	select	concise	Controls the level of explanation in the reasoning summary.

Key Features

Configurable Reasoning Effort: Supports low, medium, and high reasoning levels via system prompts
Full Chain-of-Thought Access: Provides visibility into the model’s reasoning process for debugging and trust
Agentic Capabilities: Built-in support for function calling, web browsing, Python code execution, and structured outputs
Fine-Tunable: Can be customized through parameter fine-tuning

Summary

GPT OSS 120B is a high-capacity open-weight language model built for advanced reasoning, long-context tasks, and agentic workflows. With configurable reasoning levels, full chain-of-thought access, tool use, and fine-tuning support, it is well suited for complex inference pipelines. Its Apache 2.0 license and efficient MXFP4 quantization make it accessible for both research and production deployments on modern GPUs.

Getting started

GPU Compute

Inferencing

AI Tools

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Summary

Getting started

GPU Compute

Inferencing

AI Tools

​About the Provider

​Model Quickstart

​Model Overview

​Model at a Glance

​When to use?

​Inference Parameters

​Key Features

​Summary

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Summary