ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai

Surapon Nonesung; Teetouch Jaknamon; Sirinya Chaiophat; Natapong Nitarach; Chanakan Wittayasakpan; Warit Sirichotedumrong; Adisai Na-Thalang; Kunat Pipatanakul

arXiv:2511.04479·cs.CL·December 5, 2025

ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai

Surapon Nonesung, Teetouch Jaknamon, Sirinya Chaiophat, Natapong Nitarach, Chanakan Wittayasakpan, Warit Sirichotedumrong, Adisai Na-Thalang, Kunat Pipatanakul

PDF

Open Access 4 Models 1 Datasets

TL;DR

ThaiOCRBench is a comprehensive benchmark designed to evaluate vision-language models on diverse Thai text-rich visual understanding tasks, highlighting performance gaps and challenges in low-resource, script-complex settings.

Contribution

It introduces the first diverse, human-annotated Thai benchmark for vision-language understanding, covering 13 task categories and evaluating multiple models in zero-shot settings.

Findings

01

Proprietary models outperform open-source models.

02

Fine-grained text recognition and handwritten content extraction are challenging for open-source models.

03

Key challenges include language bias, structural mismatch, and hallucinated content.

Abstract

We present ThaiOCRBench, the first comprehensive benchmark for evaluating vision-language models (VLMs) on Thai text-rich visual understanding tasks. Despite recent progress in multimodal modeling, existing benchmarks predominantly focus on high-resource languages, leaving Thai underrepresented, especially in tasks requiring document structure understanding. ThaiOCRBench addresses this gap by offering a diverse, human-annotated dataset comprising 2,808 samples across 13 task categories. We evaluate a wide range of state-of-the-art VLMs in a zero-shot setting, spanning both proprietary and open-source systems. Results show a significant performance gap, with proprietary models (e.g., Gemini 2.5 Pro) outperforming open-source counterparts. Notably, fine-grained text recognition and handwritten content extraction exhibit the steepest performance drops among open-source models. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

typhoon-ai/ThaiOCRBench
dataset· 1.3k dl
1.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Topic Modeling