Training Kindai OCR with parallel textline images and self-attention feature distance-based loss
Anh Le, Asanobu Kitamoto

TL;DR
This paper proposes a novel training method for OCR on historical Japanese documents using parallel textline images and a self-attention feature distance loss, improving accuracy and domain adaptation.
Contribution
It introduces a distance-based loss function utilizing parallel images and self-attention features to enhance OCR performance on scarce historical data.
Findings
Reduced character error rate by up to 3.94%
Improved self-attention feature discrimination
Effective domain adaptation for historical OCR
Abstract
Kindai documents, written in modern Japanese from the late 19th to early 20th century, hold significant historical value for researchers studying societal structures, daily life, and environmental conditions of that period. However, transcribing these documents remains a labor-intensive and time-consuming task, resulting in limited annotated data for training optical character recognition (OCR) systems. This research addresses this challenge of data scarcity by leveraging parallel textline images - pairs of original Kindai text and their counterparts in contemporary Japanese fonts - to augment training datasets. We introduce a distance-based objective function that minimizes the gap between self-attention features of the parallel image pairs. Specifically, we explore Euclidean distance and Maximum Mean Discrepancy (MMD) as domain adaptation metrics. Experimental results demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction
