Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

Anh Le; Asanobu Kitamoto

arXiv:2508.08537·cs.CV·August 13, 2025

Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

Anh Le, Asanobu Kitamoto

PDF

Open Access

TL;DR

This paper proposes a novel training method for OCR on historical Japanese documents using parallel textline images and a self-attention feature distance loss, improving accuracy and domain adaptation.

Contribution

It introduces a distance-based loss function utilizing parallel images and self-attention features to enhance OCR performance on scarce historical data.

Findings

01

Reduced character error rate by up to 3.94%

02

Improved self-attention feature discrimination

03

Effective domain adaptation for historical OCR

Abstract

Kindai documents, written in modern Japanese from the late 19th to early 20th century, hold significant historical value for researchers studying societal structures, daily life, and environmental conditions of that period. However, transcribing these documents remains a labor-intensive and time-consuming task, resulting in limited annotated data for training optical character recognition (OCR) systems. This research addresses this challenge of data scarcity by leveraging parallel textline images - pairs of original Kindai text and their counterparts in contemporary Japanese fonts - to augment training datasets. We introduce a distance-based objective function that minimizes the gap between self-attention features of the parallel image pairs. Specifically, we explore Euclidean distance and Maximum Mean Discrepancy (MMD) as domain adaptation metrics. Experimental results demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction