Text Change Detection in Multilingual Documents Using Image Comparison
Doyoung Park, Naresh Reddy Yarram, Sunjin Kim, Minkyu Kim, Seongho, Cho, Taehee Lee

TL;DR
This paper introduces a novel image comparison approach for detecting text changes in multilingual documents, bypassing OCR limitations and improving accuracy across diverse languages.
Contribution
It presents a word-level image comparison model with multi-scale attention features for multilingual text change detection, along with a new benchmark dataset.
Findings
Outperforms OCR-based methods in multilingual scenarios
Effective change segmentation without explicit text alignment
Validated on multiple datasets including a new benchmark
Abstract
Document comparison typically relies on optical character recognition (OCR) as its core technology. However, OCR requires the selection of appropriate language models for each document and the performance of multilingual or hybrid models remains limited. To overcome these challenges, we propose text change detection (TCD) using an image comparison model tailored for multilingual documents. Unlike OCR-based approaches, our method employs word-level text image-to-image comparison to detect changes. Our model generates bidirectional change segmentation maps between the source and target documents. To enhance performance without requiring explicit text alignment or scaling preprocessing, we employ correlations among multi-scale attention features. We also construct a benchmark dataset comprising actual printed and scanned word pairs in various languages to evaluate our model. We validate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Authorship Attribution and Profiling · Handwritten Text Recognition Techniques
MethodsSoftmax · Attention Is All You Need
