Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter
Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Dezhi Peng, Zhe Li, Mengchao, He, Yongpan Wang, Canjie Luo

TL;DR
This paper introduces Implicit Feature Alignment (IFA), a novel method that enables standard text recognizers to process multi-line text without relying on text detection, achieving state-of-the-art results in document recognition.
Contribution
The paper proposes IFA, a new inference paradigm that integrates into existing recognizers, allowing them to handle multi-line text and improve OCR performance.
Findings
State-of-the-art end-to-end document recognition performance
Fast inference speed maintained
Effective integration of IFA with attention-based and CTC-based recognizers
Abstract
Text recognition is a popular research subject with many associated challenges. Despite the considerable progress made in recent years, the text recognition task itself is still constrained to solve the problem of reading cropped line text images and serves as a subtask of optical character recognition (OCR) systems. As a result, the final text recognition result is limited by the performance of the text detector. In this paper, we propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA), which can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference. This enables an ordinary text recognizer to process multi-line text such that text detection can be completely freed. Specifically, we integrate IFA into the two most prevailing text recognition streams (attention-based and CTC-based) and propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Music and Audio Processing
