SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression
Qingwen Bu, Sungrae Park, Minsoo Khang, Yichuan Cheng

TL;DR
SRFormer is a unified transformer-based text detection model that combines segmentation and regression techniques, leveraging early-layer segmentation predictions and a novel query enhancement to achieve high accuracy and efficiency.
Contribution
The paper introduces SRFormer, a novel DETR-based model that integrates segmentation and regression for improved robustness and efficiency in text detection.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Demonstrates superior robustness and data efficiency.
Reduces computational load through strategic model design.
Abstract
Existing techniques for text detection can be broadly classified into two primary groups: segmentation-based and regression-based methods. Segmentation models offer enhanced robustness to font variations but require intricate post-processing, leading to high computational overhead. Regression-based methods undertake instance-aware prediction but face limitations in robustness and data efficiency due to their reliance on high-level representations. In our academic pursuit, we propose SRFormer, a unified DETR-based model with amalgamated Segmentation and Regression, aiming at the synergistic harnessing of the inherent robustness in segmentation representations, along with the straightforward post-processing of instance-level regression. Our empirical analysis indicates that favorable segmentation predictions can be obtained at the initial decoder layers. In light of this, we constrain the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
