Typhoon OCR: Open Vision-Language Model For Thai Document Extraction
Surapon Nonesung, Natapong Nitarach, Teetouch Jaknamon, Pittawat Taveekitworachai, Kunat Pipatanakul

TL;DR
Typhoon OCR is an open, lightweight vision-language model specifically designed for Thai and English document extraction, achieving high accuracy across diverse real-world Thai documents with lower computational costs.
Contribution
The paper introduces Typhoon OCR, a novel open VLM tailored for Thai document extraction, utilizing a multi-stage dataset construction pipeline and achieving competitive performance.
Findings
Achieves performance comparable to proprietary models on Thai documents.
Reduces computational cost with a compact, inference-efficient model.
Demonstrates effective text and layout extraction across diverse Thai document types.
Abstract
Document extraction is a core component of digital workflows, yet existing vision-language models (VLMs) predominantly favor high-resource languages. Thai presents additional challenges due to script complexity from non-latin letters, the absence of explicit word boundaries, and the prevalence of highly unstructured real-world documents, limiting the effectiveness of current open-source models. This paper presents Typhoon OCR, an open VLM for document extraction tailored for Thai and English. The model is fine-tuned from vision-language backbones using a Thai-focused training dataset. The dataset is developed using a multi-stage data construction pipeline that combines traditional OCR, VLM-based restructuring, and curated synthetic data. Typhoon OCR is a unified framework capable of text transcription, layout reconstruction, and document-level structural consistency. The latest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Topic Modeling
