
About the Provider
Qwen is an AI model family developed by Alibaba Group, a major Chinese technology and cloud computing company. Through its Qwen initiative, Alibaba builds and open-sources advanced language , images and coding models under permissive licenses to support innovation, developer tooling, and scalable AI integration across applicationsModel Quickstart
This section helps you quickly get started with theQwen/Qwen3-Coder-30B-A3B-Instruct model on the Qubrid AI inferencing platform.
To use this model, you need:
- A valid Qubrid API key
- Access to the Qubrid inference API
- Basic knowledge of making API requests in your preferred language
Qwen/Qwen3-Coder-30B-A3B-Instruct model and receive responses based on your input prompts.
Below are example placeholders showing how the model can be accessed using different programming environments.You can choose the one that best fits your workflow.
Model Overview
Qwen3 Coder 30B A3B is a large causal language model designed for code generation and technical reasoning. It belongs to the latest generation of the Qwen model family and supports both thinking mode for complex reasoning and non-thinking mode for efficient general usage within the same model. The model is built using a Mixture-of-Experts (MoE) architecture, activating only a subset of parameters per request to balance performance and efficiency. It is trained through both pretraining and post-training stages and supports long context lengths for complex coding and reasoning workflows.Model at a Glance
| Feature | Details |
|---|---|
| Model ID | Qwen/Qwen3-Coder-30B-A3B-Instruct |
| Provider | Qwen |
| Model Type | Causal Language Model |
| Architecture | Mixture-of-Experts (MoE) Transformer, 48 layers, GQA attention, 128 experts (8 active per forward pass) |
| Model Size | 1.1B Params |
| Parameters | 4B |
When to use?
You should consider using Qwen3 Coder 30B A3B if:- Your application focuses on code generation or technical reasoning
- You need long context support for large codebases or complex prompts
- You want a model that can switch between deep reasoning and efficient responses
- Your workflow includes agent-based tasks with external tools
- You require multilingual support for technical or coding tasks
Inference Parameters
| Parameter Name | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output. |
| Temperature | number | 0.7 | Controls randomness; higher values produce more diverse, less deterministic output. |
| Max Tokens | number | 65536 | Maximum tokens to generate in the response, suitable for long-form code or large refactors. |
| Top P | number | 0.8 | Nucleus sampling controlling token sampling diversity. |
Key Features
- Supports thinking mode for complex reasoning, mathematics, and coding
- Supports non-thinking mode for efficient general-purpose dialogue
- Strong performance in code generation, technical reasoning, and logical tasks
- Designed for agent workflows with tool integration
- Supports multilingual instruction following and translation
Best Practices
Sampling Settings
Thinking Mode
(enable_thinking = true) :
- Temperature:
0.6 - Top-P:
0.95 - Top-K:
20 - Min-P:
0
Avoid greedy decoding to prevent repetition and degraded performance.
Non-Thinking Mode
(enable_thinking = false) :
- Temperature:
0.7 - Top-P:
0.8 - Top-K:
20 - Min-P:
0
Output Length
- Recommended output length:
32,768tokens - For highly complex math or programming problems:
38,912tokens
Prompt Standardization
Math Problems
Include the following instruction:- Historical responses should include only the final output
- Thinking content should not be stored in conversation history
- This behavior is handled automatically in the provided Jinja2 chat template