Skip to main content

About the Provider

Qwen is an AI model family developed by Alibaba Group, a major Chinese technology and cloud computing company. Through its Qwen initiative, Alibaba builds and open-sources advanced language, images and coding models under permissive licenses to support innovation, developer tooling, and scalable AI integration across applications.

Model Quickstart

This section helps you quickly get started with the Qwen/Qwen3-VL-8B-Instruct model on the Qubrid AI inferencing platform. To use this model, you need:
  • A valid Qubrid API key
  • Access to the Qubrid inference API
  • Basic knowledge of making API requests in your preferred language
Once authenticated with your API key, you can send inference requests to the Qwen/Qwen3-VL-8B-Instruct model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.
import requests
import json
from pprint import pprint

url = "https://platform.qubrid.com/api/v1/qubridai/multimodal/chat"
headers = {
"Authorization": "Bearer <QUBRID_API_KEY>",
"Content-Type": "application/json"
}

data = {
"model": "Qwen/Qwen3-VL-8B-Instruct",
"max_tokens": 4096,
"temperature": 0.7,
"stream": False,
"messages": [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Describe all images in one sentence."
            },
            {
            "type": "image_url",
            "image_url": {
                "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
            }
            }
        ]
    }
]
}
response = requests.post(
  url,
  headers=headers,
  json=data,
  stream=True  
)
content_type = response.headers.get("Content-Type", "")
if "application/json" in content_type:
  pprint(response.json())
else:
  for line in response.iter_lines(decode_unicode=True):
      if not line:
          continue
      if line.startswith("data:"):
          payload = line.replace("data:", "").strip()
          if payload == "[DONE]":
              break
          try:
              chunk = json.loads(payload)
              pprint(chunk)
          except json.JSONDecodeError:
              print("Raw chunk:", payload)

This will produce a response similar to the one below:
{
  "id": "chatcmpl-8908cd8b586c496bbef3bba04edbbe99",
  "object": "chat.completion",
  "created": 1764851200,
  "model": "Qwen/Qwen3-VL-8B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The image shows the Statue of Liberty standing tall on Liberty Island in New York Harbor, with a clear blue sky in the background.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null,
      "token_ids": null
    }
  ],
  "usage": {
    "prompt_tokens": 128,
    "total_tokens": 158,
    "completion_tokens": 30
  }
}

Model Overview

Qwen3 VL 8B Instruct is a vision-language instruction-tuned model designed to understand and reason over both text and images. It supports OCR, streaming responses, and rich multimodal conversations, making it suitable for vision-language inference workflows that require text–image understanding rather than content generation. The model focuses on strong visual perception, spatial reasoning, long-context understanding, and multimodal reasoning while remaining accessible for deployment across different environments.

Model at a Glance

FeatureDetails
Model IDQwen/Qwen3-VL-8B-Instruct
ProviderAlibaba Cloud (QwenLM)
Model TypeVision-Language Instruction-Tuned Model
ArchitectureTransformer decoder-only (Qwen3-VL with ViT visual encoder)
Model Size9B
Parameters6
Context Length32K tokens
Training DataMultilingual multimodal dataset (text + images)

When to use?

Use Qwen3 VL 8B. Instruct if your inference workload requires:
  • Understanding and reasoning over images and text together
  • OCR across multiple languages with structured document understanding
  • Visual question answering and image captioning
  • Multimodal chat with streaming support
  • Spatial reasoning and visual perception without image generation needs

Inference Parameters

Parameter NameTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output.
Temperaturenumber0.7Controls randomness in the output.
Max Tokensnumber2048Maximum number of tokens to generate.
Top Pnumber0.9Controls nucleus sampling.
Top Knumber50Limits sampling to the top-k tokens.
Presence Penaltynumber0Discourages repeated tokens in the output.

Key Features

  • Strong Vision-Language Capabilities: Handles text and image understanding in a unified manner
  • Multilingual OCR: Supports OCR in up to 32 languages with improved robustness
  • Long-Context & Video Understanding: Designed for extended context reasoning within the Qwen3-VL family
  • Streaming Support: Enables fast, incremental response generation
  • Advanced Spatial & Visual Reasoning: Understands object positions, layouts, and visual relationships

Summary

Qwen3 VL 8B Instruct is a vision-language inference model focused on understanding, reasoning, and interaction across text and images. It supports OCR, streaming responses, and multimodal conversations with strong visual perception and spatial reasoning. The model is suited for document analysis, visual QA, and multimodal chat scenarios. It does not perform image generation and is optimized for understanding tasks. Its Apache 2.0 license and instruction-tuned design make it suitable for accessible deployment on inference platforms.