HunyuanOCR Technical Report

Hunyuan Vision Team; Pengyuan Lyu; Xingyu Wan; Gengluo Li; Shangpin Peng; Weinong Wang; Liang Wu; Huawen Shen; Yu Zhou; Canhui Tang; Qi Yang; Qiming Peng; Bin Luo; Hower Yang; Xinsong Zhang; Jinnian Zhang; Houwen Peng; Hongming Yang; Senhao Xie; Longsha Zhou; Ge Pei; Binghong Wu; Rui Yan; Kan Wu; Jieneng Yang; Bochao Wang; Kai Liu; Jianchen Zhu; Jie Jiang; Linus; Han Hu; Chengquan Zhang

arXiv:2511.19575·cs.CV·December 12, 2025

HunyuanOCR Technical Report

Hunyuan Vision Team, Pengyuan Lyu, Xingyu Wan, Gengluo Li, Shangpin Peng, Weinong Wang, Liang Wu, Huawen Shen, Yu Zhou, Canhui Tang, Qi Yang, Qiming Peng, Bin Luo, Hower Yang, Xinsong Zhang, Jinnian Zhang, Houwen Peng, Hongming Yang, Senhao Xie, Longsha Zhou, Ge Pei, Binghong Wu

PDF

Open Access 3 Models

TL;DR

HunyuanOCR is a lightweight, open-source Vision-Language Model that achieves state-of-the-art performance in OCR tasks, combining versatility, efficiency, and end-to-end design, with significant improvements driven by high-quality data and RL strategies.

Contribution

The paper introduces HunyuanOCR, a novel lightweight VLM that unifies multiple OCR capabilities in an end-to-end framework, surpassing larger models and commercial solutions.

Findings

01

Outperforms commercial APIs and larger models in OCR perception and semantic tasks.

02

Achieves first place in ICDAR 2025 DIMT Challenge (Small Model Track).

03

Demonstrates the effectiveness of RL strategies in OCR performance.

Abstract

This paper presents HunyuanOCR, a commercial-grade, open-source, and lightweight (1B parameters) Vision-Language Model (VLM) dedicated to OCR tasks. The architecture comprises a Native Vision Transformer (ViT) and a lightweight LLM connected via an MLP adapter. HunyuanOCR demonstrates superior performance, outperforming commercial APIs, traditional pipelines, and larger models (e.g., Qwen3-VL-4B). Specifically, it surpasses current public solutions in perception tasks (Text Spotting, Parsing) and excels in semantic tasks (IE, Text Image Translation), securing first place in the ICDAR 2025 DIMT Challenge (Small Model Track). Furthermore, it achieves state-of-the-art (SOTA) results on OCRBench among VLMs with fewer than 3B parameters. HunyuanOCR achieves breakthroughs in three key aspects: 1) Unifying Versatility and Efficiency: We implement comprehensive support for core capabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis