Adaptive Text Recognition through Visual Matching
Chuhan Zhang, Ankush Gupta, Andrew Zisserman

TL;DR
This paper presents a novel shape matching-based text recognition model that improves generalization and flexibility across fonts, languages, and characters without retraining, outperforming existing methods.
Contribution
The proposed model decouples visual and linguistic learning, enabling shape matching for better generalization and class flexibility in text recognition tasks.
Findings
Generalizes to unseen fonts without new exemplars
Flexibly changes number of classes with different exemplars
Handles new languages and characters without retraining
Abstract
In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents. We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual representation learning and linguistic modelling stages. By doing this, we turn text recognition into a shape matching problem, and thereby achieve generalization in appearance and flexibility in classes. We evaluate the new model on both synthetic and real datasets across different alphabets and show that it can handle challenges that traditional architectures are not able to solve without expensive retraining, including: (i) it can generalize to unseen fonts without new exemplars from them; (ii) it can flexibly change the number of classes, simply by changing the exemplars provided; and (iii) it can generalize to new languages and new characters that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
