Skip to main content

About the Provider

Tencent is a major Chinese technology company and cloud services provider that develops AI models and research technologies through its Hunyuan AI initiative. The company focuses on creating advanced open-source and commercial AI systems—including vision-language, OCR, and foundation models—to support developers, enterprises, and real-world applications across industries.

Model Quickstart

This section helps you quickly get started with the tencent/HunyuanOCR model on the Qubrid AI inferencing platform. To use this model, you need:
  • A valid Qubrid API key
  • Access to the Qubrid inference API
  • Basic knowledge of making API requests in your preferred language
Once authenticated with your API key, you can send inference requests to the tencent/HunyuanOCR model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.
import requests
import json
from pprint import pprint

url = "https://platform.qubrid.com/api/v1/qubridai/chat/completions"

headers = {
  "Authorization": "Bearer Qubrid_API_KEY",
  "Content-Type": "application/json"
}
data = {
  "model": "tencent/HunyuanOCR",
  "messages": [
      {
          "role": "user",
          "content": [
              {
                  "type": "image_url",
                  "image_url": {
                      "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
                  }
              }
          ]
      }
  ],
  "max_tokens": 4096,
  "temperature": 0,
  "language": "auto",
  "ocr_mode": "general",
  "stream": False  
}
response = requests.post(
  url,
  headers=headers,
  json=data,
  stream=True  
)
content_type = response.headers.get("Content-Type", "")
if "application/json" in content_type:
  pprint(response.json())
else:
  for line in response.iter_lines(decode_unicode=True):
      if not line:
          continue
      if line.startswith("data:"):
          payload = line.replace("data:", "").strip()
          if payload == "[DONE]":
              break
          try:
              chunk = json.loads(payload)
              pprint(chunk)
          except json.JSONDecodeError:
              print("Raw chunk:", payload)

Model Overview

Hunyuan OCR (1B) is an end-to-end OCR-focused vision-language model built on Hunyuan’s native multimodal architecture.
  • It is designed to perform text extraction and document understanding tasks using a single instruction and a single inference step.
  • With a lightweight 1B parameter size, the model supports multilingual document parsing and multiple OCR-related tasks while remaining efficient for deployment on inferencing platforms.
  • The model is focused purely on OCR workflows and is not intended for general visual question answering.

Model at a Glance

FeatureDetails
Model IDtencent/HunyuanOCR
ProviderTencent
Parameters1B
Context Length16k tokens
Model TypeOCR-focused Vision-Language Model

When to use?

You should consider using Hunyuan OCR (1B) if:
  • You need an OCR-specific model rather than a general-purpose vision model
  • Your application involves document parsing, text spotting, or subtitle extraction
  • You work with multilingual or mixed-language content
  • You prefer an end-to-end OCR model instead of cascading OCR systems
  • You require a lightweight model optimized for efficient inference
Do not use this model for general visual question answering tasks.

Key Features

  • Efficient Lightweight Architecture : Built on Hunyuan’s native multimodal architecture, achieving strong OCR performance with only 1B parameters and reduced deployment cost.
  • Comprehensive OCR Coverage : Supports text detection, text recognition, complex document parsing, open-field information extraction, video subtitle extraction, photo translation, and document QA within a single model.
  • End-to-End Inference Workflow : Designed around a single-instruction, single-inference approach, avoiding multi-stage OCR pipelines and cascade errors.
  • Multilingual Document Support : Provides robust support for over 100 languages, including mixed-language documents and varied document types.

Inference Parameters

Parameter NameTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output.
LanguageselectenOptional language hint to improve OCR accuracy for specific languages.
OCR ModeselectgeneralSelect optimized OCR mode based on the image type.
Max Output Tokensnumber4096Maximum number of tokens for the generated text.
Temperaturenumber0Controls randomness. Keep at 0 for accurate text extraction.

Performance Characteristics

Strengths

  • Lightweight 1B parameter model with strong OCR accuracy
  • Native handling of high-resolution images and extreme aspect ratios
  • Unified end-to-end architecture without bounding-box error propagation
  • Effective recognition of rotated and vertical text
  • Strong multilingual and mixed-script support

Considerations

  • Designed specifically for OCR, not general visual reasoning
  • May hallucinate on extremely blurred or low-resolution text
  • Throughput depends on visual token density

Summary

Hunyuan OCR (1B) is a lightweight, OCR-focused vision-language model developed by Tencent Hunyuan.
  • It performs end-to-end OCR tasks using a single instruction and single inference step.
  • The model supports multilingual and mixed-language document parsing across images and videos.
  • It is optimized for efficient deployment with a 1B parameter size and fp16 quantization.
  • The model is best suited for OCR pipelines rather than general-purpose vision tasks.