DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting
Seonghyeon Kim, Seung Shin, Yoonsik Kim, Han-Cheol Cho, Taeho Kil,, Jaeheung Surh, Seunghyun Park, Bado Lee, Youngmin Baek

TL;DR
DEER introduces a detection-agnostic end-to-end scene text recognizer that relies on reference points rather than precise detection, improving robustness to detection errors and simplifying the text spotting process.
Contribution
It proposes a novel detection-agnostic framework that reduces dependency on detection accuracy by using reference points, enabling effective recognition without detailed region annotations.
Findings
Achieves competitive results on standard benchmarks.
Demonstrates robustness to detection errors.
Eliminates need for arbitrarily-shaped detectors.
Abstract
Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are cropped out. Considering that it is hard to accurately decide word boundaries with only the detector, we propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method reduces the tight dependency between detection and recognition modules by bridging them with a single reference point for each text instance, instead of using detected regions. The proposed method allows the decoder to recognize the texts that are indicated by the reference point, with features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Vehicle License Plate Recognition
