Skip to main content

About the Provider

Alibaba Cloud is the cloud computing arm of Alibaba Group and the creator of the Qwen model family. Through its open-source initiative, Alibaba has released state-of-the-art language and multimodal models under permissive licenses, enabling developers and enterprises to build powerful AI applications across diverse domains and languages.

Model Quickstart

This section helps you quickly get started with the Qwen/Qwen3-VL-235B-A22B-Instruct model on the Qubrid AI inferencing platform. To use this model, you need:
  • A valid Qubrid API key
  • Access to the Qubrid inference API
  • Basic knowledge of making API requests in your preferred language
Once authenticated with your API key, you can send inference requests to the Qwen/Qwen3-VL-235B-A22B-Instruct model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.
from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
    base_url="https://platform.qubrid.com/v1",
    api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
    model="Qwen/Qwen3-VL-235B-A22B-Instruct",
    messages=[
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image? Describe the main elements."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
            }
          }
        ]
      }
    ],
    max_tokens=8962,
    temperature=0.7,
    top_p=1,
    stream=True
)

# If stream = False comment this out
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

Model Overview

Qwen3-VL-235B-A22B-Instruct is a comprehensively upgraded vision-language model in the Qwen3 series with significant improvements in visual coding and spatial perception.
  • Its visual perception and recognition capabilities have significantly improved, supporting the understanding of ultra-long videos, with a major enhancement to OCR functionality.
  • With 235B parameters and up to 128K context, it delivers state-of-the-art multimodal quality for complex visual reasoning, scientific diagrams, and chart analysis.

Model at a Glance

FeatureDetails
Model IDQwen/Qwen3-VL-235B-A22B-Instruct
ProviderAlibaba Cloud (Qwen Team)
ArchitectureTransformer decoder-only (Qwen3-VL with ViT visual encoder)
Model Size235B params
Parameters5
Context LengthUp to 128K Tokens
Release Date2025
LicenseApache 2.0
Training DataMultilingual multimodal dataset (text + images)

When to use?

You should consider using Qwen3 VL 235B A22B Instruct if:
  • You need complex visual reasoning
  • Your application requires analysis of scientific diagrams
  • Your use case involves chart understanding and data extraction

Inference Parameters

Parameter NameTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output.
Temperaturenumber0.7Controls creativity and randomness. Higher values produce more diverse output.
Max Tokensnumber8962Maximum number of tokens the model can generate.
Top Pnumber1Controls nucleus sampling for more predictable output.
Reasoning EffortselectmediumAdjusts the depth of reasoning and problem-solving effort. Higher settings yield more thorough responses at the cost of latency.

Key Features

  • State-of-the-Art Multimodal Quality: Comprehensively upgraded visual coding and spatial perception over previous Qwen3 VL models.
  • Enhanced OCR: Major improvement to text recognition from images, documents, and real-world scenes.
  • Ultra-Long Video Understanding: Supports understanding of very long video sequences for temporal reasoning tasks.
  • 235B Parameters: Frontier-scale vision-language model delivering maximum accuracy on complex multimodal tasks.
  • Apache 2.0 License: Fully open-source with unrestricted commercial use.

Summary

Qwen3-VL-235B-A22B-Instruct is Alibaba’s most capable open-source vision-language model, delivering state-of-the-art multimodal quality.
  • It uses a Transformer decoder-only architecture with a ViT visual encoder and 235B parameters, trained on a multilingual multimodal dataset.
  • It features comprehensively upgraded visual coding, spatial perception, enhanced OCR, and ultra-long video understanding.
  • The model supports up to 128K context with configurable reasoning effort for complex visual reasoning tasks.
  • Licensed under Apache 2.0 for full commercial use.