
About the Provider
OpenAI is the organization behind GPT OSS 120B. They are a major AI research lab and platform provider known for creating influential generative AI models (like the GPT series). With GPT-OSS, OpenAI is extending its technology into the open-source ecosystem, empowering developers and enterprises to run powerful language models without proprietary restrictions.Model Quickstart
This section helps you quickly get started with theopenai/gpt-oss-120b model on the Qubrid AI inferencing platform.
To use this model, you need:
- A valid Qubrid API key
- Access to the Qubrid inference API
- Basic knowledge of making API requests in your preferred language
openai/gpt-oss-120b model and receive responses based on your input prompts.
Below are example placeholders showing how the model can be accessed using different programming environments.You can choose the one that best fits your workflow.
Model Overview
GPT OSS 120B is the most powerful open-weight model in the gpt-oss family. It is designed for large-scale reasoning, agentic workflows, and long-context tasks while remaining deployable on a single high-end GPU. The model supports configurable reasoning levels, full chain-of-thought access, tool use, and fine-tuning, making it suitable for advanced inference and customization scenarios.Model at a Glance
| Feature | Details |
|---|---|
| Model ID | openai/gpt-oss-120b |
| Model Type | Open-weight large language model |
| Architecture | Large-Scale Mixture-of-Experts (MoE) with adaptive routing, SwiGLU activations, hierarchical sparse attention, and token-choice MoE for reasoning efficiency |
| Context Length | 256k Tokens |
| Model Size | 121.7B Params |
| Parameters | 6 |
| Training Data | Extensive multi-domain knowledge corpus with safety-aligned fine-tuning, enterprise & community feedback loops, and agentic task simulation datasets |
When to use?
Use GPT OSS 120B if you need:- Long-context reasoning with very large input and output windows
- Adjustable reasoning depth based on latency and task complexity
- Full access to the model’s chain-of-thought for debugging and analysis
- Agentic capabilities such as function calling, web browsing, and Python execution
- The ability to fine-tune the model for domain-specific use cases
Inference Parameters
| Parameter Name | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output. |
| Temperature | number | 0.7 | Controls randomness. Higher values mean more creative but less predictable output. |
| Max Tokens | number | 4096 | Maximum number of tokens to generate in the response. |
| Top P | number | 1 | Nucleus sampling: considers tokens with top_p probability mass. |
| Reasoning Effort | select | medium | Controls how much reasoning effort the model should apply. |
| Reasoning Summary | select | concise | Controls the level of explanation in the reasoning summary. |
Key Features
- Configurable Reasoning Effort: Supports low, medium, and high reasoning levels via system prompts
- Full Chain-of-Thought Access: Provides visibility into the model’s reasoning process for debugging and trust
- Agentic Capabilities: Built-in support for function calling, web browsing, Python code execution, and structured outputs
- Fine-Tunable: Can be customized through parameter fine-tuning