SRFormer: Text Detection Transformer with Incorporated Segmentation and   Regression

Qingwen Bu; Sungrae Park; Minsoo Khang; Yichuan Cheng

arXiv:2308.10531·cs.CV·December 27, 2023

SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression

Qingwen Bu, Sungrae Park, Minsoo Khang, Yichuan Cheng

PDF

Open Access 2 Repos 1 Video

TL;DR

SRFormer is a unified transformer-based text detection model that combines segmentation and regression techniques, leveraging early-layer segmentation predictions and a novel query enhancement to achieve high accuracy and efficiency.

Contribution

The paper introduces SRFormer, a novel DETR-based model that integrates segmentation and regression for improved robustness and efficiency in text detection.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Demonstrates superior robustness and data efficiency.

03

Reduces computational load through strategic model design.

Abstract

Existing techniques for text detection can be broadly classified into two primary groups: segmentation-based and regression-based methods. Segmentation models offer enhanced robustness to font variations but require intricate post-processing, leading to high computational overhead. Regression-based methods undertake instance-aware prediction but face limitations in robustness and data efficiency due to their reliance on high-level representations. In our academic pursuit, we propose SRFormer, a unified DETR-based model with amalgamated Segmentation and Regression, aiming at the synergistic harnessing of the inherent robustness in segmentation representations, along with the straightforward post-processing of instance-level regression. Our empirical analysis indicates that favorable segmentation predictions can be obtained at the initial decoder layers. In light of this, we constrain the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression· underline

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques