Explicit Relational Reasoning Network for Scene Text Detection
Yuchen Su, Zhineng Chen, Yongkun Du, Zhilong Ji, Kai Hu, Jinfeng Bai,, Xieping Gao

TL;DR
ERRNet introduces an end-to-end scene text detection method that models component relationships explicitly, eliminating post-processing and achieving state-of-the-art accuracy with high efficiency.
Contribution
The paper proposes ERRNet, a novel relational reasoning network that treats text components as objects in a tracking framework, removing the need for post-processing in CC-based text detection.
Findings
Achieves state-of-the-art accuracy on benchmarks.
Eliminates post-processing in scene text detection.
Maintains high inference speed.
Abstract
Connected component (CC) is a proper text shape representation that aligns with human reading intuition. However, CC-based text detection methods have recently faced a developmental bottleneck that their time-consuming post-processing is difficult to eliminate. To address this issue, we introduce an explicit relational reasoning network (ERRNet) to elegantly model the component relationships without post-processing. Concretely, we first represent each text instance as multiple ordered text components, and then treat these components as objects in sequential movement. In this way, scene text detection can be innovatively viewed as a tracking problem. From this perspective, we design an end-to-end tracking decoder to achieve a CC-based method dispensing with post-processing entirely. Additionally, we observe that there is an inconsistency between classification confidence and localization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Text Analysis Techniques · Semantic Web and Ontologies · Rough Sets and Fuzzy Logic
