TL;DR
This paper introduces a refined feature-attentive network (RFN) that improves industrial scene text detection by enhancing localization accuracy through multi-resolution feature integration and attention mechanisms, validated on new and public datasets.
Contribution
The paper presents a novel RFN architecture with parallel feature integration and attention refinement, specifically designed for challenging industrial scene text detection.
Findings
Achieves state-of-the-art performance on industrial and public datasets.
Effectively improves localization accuracy in low-contrast, cluttered scenes.
Constructs large-scale industrial scene text datasets for training and evaluation.
Abstract
Detecting the marking characters of industrial metal parts remains challenging due to low visual contrast, uneven illumination, corroded character structures, and cluttered background of metal part images. Affected by these factors, bounding boxes generated by most existing methods locate low-contrast text areas inaccurately. In this paper, we propose a refined feature-attentive network (RFN) to solve the inaccurate localization problem. Specifically, we design a parallel feature integration mechanism to construct an adaptive feature representation from multi-resolution features, which enhances the perception of multi-scale texts at each scale-specific level to generate a high-quality attention map. Then, an attentive refinement network is developed by the attention map to rectify the location deviation of candidate boxes. In addition, a re-scoring mechanism is designed to select text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
