Finetuning Vision-Language Models as OCR Systems for Low-Resource Languages: A Case Study of Manchu

Yan Hon Michael Chung; Donghyeok Choi

arXiv:2507.06761·cs.CV·July 10, 2025

Finetuning Vision-Language Models as OCR Systems for Low-Resource Languages: A Case Study of Manchu

Yan Hon Michael Chung, Donghyeok Choi

PDF

Open Access

TL;DR

This paper develops a cost-effective OCR system for the endangered Manchu language by fine-tuning vision-language models on synthetic data, achieving high accuracy on real-world historical documents and enabling digital humanities research.

Contribution

It introduces a novel fine-tuning approach for vision-language models on synthetic data to effectively recognize historical Manchu documents, facilitating low-resource language digitization.

Findings

01

LLaMA-3.2-11B achieved 93.1% accuracy on real handwritten documents.

02

Synthetic training data enabled effective domain transfer from synthetic to real data.

03

Compared to traditional methods, the proposed approach maintains high accuracy with lower resource requirements.

Abstract

Manchu, a critically endangered language essential for understanding early modern Eastern Eurasian history, lacks effective OCR systems that can handle real-world historical documents. This study develops high-performing OCR systems by fine-tuning three open-source vision-language models (LLaMA-3.2-11B, Qwen2.5-VL-7B, Qwen2.5-VL-3B) on 60,000 synthetic Manchu word images using parameter-efficient training. LLaMA-3.2-11B achieved exceptional performance with 98.3\% word accuracy and 0.0024 character error rate on synthetic data, while crucially maintaining 93.1\% accuracy on real-world handwritten documents. Comparative evaluation reveals substantial advantages over traditional approaches: while a CRNN baseline achieved 99.8\% synthetic accuracy, it suffered severe degradation to 72.5\% on real documents. Our approach demonstrates effective synthetic-to-real domain transfer, providing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Digital Humanities and Scholarship