PBFormer: Capturing Complex Scene Text Shape with Polynomial Band   Transformer

Ruijin Liu; Ning Lu; Dapeng Chen; Cheng Li; Zejian Yuan; Wei Peng

arXiv:2308.15004·cs.CV·August 30, 2023

PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Ruijin Liu, Ning Lu, Dapeng Chen, Cheng Li, Zejian Yuan, Wei Peng

PDF

TL;DR

PBFormer introduces a novel scene text detection method that uses Polynomial Band representation combined with transformer architecture, effectively capturing complex text shapes and distinguishing overlapping texts without postprocessing.

Contribution

The paper proposes Polynomial Band as a new shape representation and integrates it with a transformer-based detector, enabling accurate detection of arbitrarily shaped texts with a unified, end-to-end approach.

Findings

01

Outperforms previous state-of-the-art on arbitrary-shaped text datasets.

02

Effectively models complex text shapes with fixed-parameter polynomial curves.

03

Detects overlapping texts by differentiating curve coefficients.

Abstract

We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right sides, which can capture a text with a complex shape by varying polynomial coefficients. PB has appealing features compared with conventional representations: 1) It can model different curvatures with a fixed number of parameters, while polygon-points-based methods need to utilize a different number of points. 2) It can distinguish adjacent or overlapping texts as they have apparent different curve coefficients, while segmentation-based or points-based methods suffer from adhesive spatial positions. PBFormer combines the PB with the transformer, which can directly generate smooth text contours sampled from predicted curves without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Adam · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections · Residual Connection