Enhancement of text recognition for hanja handwritten documents of Ancient Korea
Joonmo Ahna, Taehong Jang, Quan Fengnyu, Hyungil Lee, Jaehyuk Lee, and, Sojung Lucia Kim

TL;DR
This paper presents a high-performance OCR model for ancient Korean hanja handwritten documents, utilizing data augmentation and a two-stage detection approach to achieve 90% recognition accuracy on cursive scripts.
Contribution
The study introduces a novel OCR method combining data augmentation and HRNet for recognizing complex hanja handwritten texts, addressing unique stylistic and historical challenges.
Findings
Achieved 90% recognition accuracy on cursive hanja documents.
Data augmentation with variable cropping improves OCR performance.
Performance is influenced by character variants and stylistic features.
Abstract
We implemented a high-performance optical character recognition model for classical handwritten documents using data augmentation with highly variable cropping within the document region. Optical character recognition in handwritten documents, especially classical documents, has been a challenging topic in many countries and research organizations due to its difficulty. Although many researchers have conducted research on this topic, the quality of classical texts over time and the unique stylistic characteristics of various authors have made it difficult, and it is clear that the recognition of hanja handwritten documents is a meaningful and special challenge, especially since hanja, which has been developed by reflecting the vocabulary, semantic, and syntactic features of the Joseon Dynasty, is different from classical Chinese characters. To study this challenge, we used 1100 cursive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
