2D-CTC for Scene Text Recognition
Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, Cong Yao

TL;DR
This paper introduces 2D-CTC, a novel extension of the traditional CTC model that leverages the 2D spatial structure of scene text to improve recognition accuracy, robustness, and interpretability.
Contribution
The paper proposes 2D-CTC, a new model that extends CTC to two dimensions, effectively handling various text orientations and shapes in scene text recognition.
Findings
Outperforms state-of-the-art methods on standard benchmarks.
Handles irregular and curved text more effectively.
Faster training and testing compared to prior approaches.
Abstract
Scene text recognition has been an important, active research topic in computer vision for years. Previous approaches mainly consider text as 1D signals and cast scene text recognition as a sequence prediction problem, by feat of CTC or attention based encoder-decoder framework, which is originally designed for speech recognition. However, different from speech voices, which are 1D signals, text instances are essentially distributed in 2D image spaces. To adhere to and make use of the 2D nature of text for higher recognition accuracy, we extend the vanilla CTC model to a second dimension, thus creating 2D-CTC. 2D-CTC can adaptively concentrate on most relevant features while excluding the impact from clutters and noises in the background; It can also naturally handle text instances with various forms (horizontal, oriented and curved) while giving more interpretable intermediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
