About the Provider
Alibaba Cloud is the cloud computing arm of Alibaba Group and the creator of the Qwen model family. Through its open-source initiative, Alibaba has released state-of-the-art language and multimodal models under permissive licenses, enabling developers and enterprises to build powerful AI applications across diverse domains and languages.Model Quickstart
This section helps you quickly get started with theQwen/Qwen3-VL-235B-A22B-Thinking model on the Qubrid AI inferencing platform.
To use this model, you need:
- A valid Qubrid API key
- Access to the Qubrid inference API
- Basic knowledge of making API requests in your preferred language
Qwen/Qwen3-VL-235B-A22B-Thinking model and receive responses based on your input prompts.
Below are example placeholders showing how the model can be accessed using different programming environments.You can choose the one that best fits your workflow.
Model Overview
Qwen3-VL-235B-A22B-Thinking is the most powerful vision-language model in the Qwen series.- With 235B total parameters and 22B active per token, it excels in multimodal STEM and math reasoning, visual agent tasks, GUI automation, spatial perception, long video comprehension, and multilingual OCR across 32 languages.
- Its thinking mode enables deep chain-of-thought reasoning over complex visual inputs, with a 256K native context window expandable to 1M tokens.
Model at a Glance
| Feature | Details |
|---|---|
| Model ID | Qwen/Qwen3-VL-235B-A22B-Thinking |
| Provider | Alibaba Cloud (Qwen Team) |
| Architecture | Sparse MoE Transformer with DeepStack multi-level ViT feature fusion and Interleaved-MRoPE for video temporal reasoning |
| Model Size | 235B Total / 22B Active |
| Context Length | 256K Tokens (up to 1M) |
| Release Date | 2025 |
| License | Apache 2.0 |
| Training Data | Large-scale multimodal dataset across 32 languages; RL post-training with thinking mode for deep reasoning |
When to use?
You should consider using Qwen3-VL-235B-A22B-Thinking if:- You need visual STEM and math reasoning with deep chain-of-thought
- Your application requires GUI automation or visual agent tasks
- Your use case involves multimodal coding from images or video
- You need long video understanding and temporal reasoning
- Your workflow requires multilingual OCR across 32 languages
- You need 3D grounding and spatial reasoning over visual inputs
Inference Parameters
| Parameter Name | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output. |
| Temperature | number | 0.7 | Controls randomness in output. |
| Max Tokens | number | 4096 | Maximum tokens to generate. |
| Top P | number | 0.9 | Controls nucleus sampling. |
Key Features
- Thinking Mode: Built-in chain-of-thought reasoning for deep multimodal problem solving across STEM, math, and visual tasks.
- DeepStack Multi-Level ViT Fusion: Multi-level visual feature fusion for fine-grained image and document understanding.
- Interleaved-MRoPE: Advanced positional encoding for precise video temporal reasoning across long sequences.
- 256K Native Context: Supports up to 1M tokens — enabling long video comprehension and large document analysis.
- Rivals Gemini 2.5 Pro: Competitive on perception and multimodal reasoning benchmarks at open-weight scale.
- Multilingual OCR: Accurate text recognition across 32 languages in images and documents.
- Apache 2.0 License: Fully open source with full commercial freedom.
Summary
Qwen3-VL-235B-A22B-Thinking is the flagship vision-language model of the Qwen series, built for deep multimodal reasoning.- It uses a Sparse MoE Transformer with DeepStack ViT fusion and Interleaved-MRoPE, with 235B total and 22B active parameters per token.
- It rivals Gemini 2.5 Pro on perception benchmarks and leads in GUI automation, visual STEM reasoning, and multilingual OCR.
- The model supports 256K native context (up to 1M), thinking mode for chain-of-thought reasoning, and 32 languages.
- Licensed under Apache 2.0 for full commercial use.