DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Yu Xie; Qian Qiao; Jun Gao; Tianxiang Wu; Jiaqing Fan; Yue Zhang; Jielei Zhang; Huyang Sun

arXiv:2408.00355·cs.CV·June 3, 2025

DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Yu Xie, Qian Qiao, Jun Gao, Tianxiang Wu, Jiaqing Fan, Yue Zhang, Jielei Zhang, Huyang Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces DNTextSpotter, a novel denoising training method for arbitrary-shaped scene text spotting that improves model stability and outperforms existing methods on multiple benchmarks.

Contribution

The paper proposes a new denoising training approach with specialized query decomposition and background perception enhancement for better text spotting performance.

Findings

01

Outperforms state-of-the-art on four benchmarks.

02

Achieves 11.3% improvement on Inverse-Text dataset.

03

Enhances model stability and accuracy in irregular-shaped text detection.

Abstract

More and more end-to-end text spotting methods based on Transformer architecture have demonstrated superior performance. These methods utilize a bipartite graph matching algorithm to perform one-to-one optimal matching between predicted objects and actual objects. However, the instability of bipartite graph matching can lead to inconsistent optimization targets, thereby affecting the training performance of the model. Existing literature applies denoising training to solve the problem of bipartite graph matching instability in object detection tasks. Unfortunately, this denoising training method cannot be directly applied to text spotting tasks, as these tasks need to perform irregular shape detection tasks and more complex text recognition tasks than classification. To address this issue, we propose a novel denoising training method (DNTextSpotter) for arbitrary-shaped text spotting.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yyyyyxie/DNTextSpotter
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Video Analysis and Summarization · Speech Recognition and Synthesis

MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections