PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer
Ruijin Liu, Ning Lu, Dapeng Chen, Cheng Li, Zejian Yuan, Wei Peng

TL;DR
PBFormer introduces a novel scene text detection method that uses Polynomial Band representation combined with transformer architecture, effectively capturing complex text shapes and distinguishing overlapping texts without postprocessing.
Contribution
The paper proposes Polynomial Band as a new shape representation and integrates it with a transformer-based detector, enabling accurate detection of arbitrarily shaped texts with a unified, end-to-end approach.
Findings
Outperforms previous state-of-the-art on arbitrary-shaped text datasets.
Effectively models complex text shapes with fixed-parameter polynomial curves.
Detects overlapping texts by differentiating curve coefficients.
Abstract
We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right sides, which can capture a text with a complex shape by varying polynomial coefficients. PB has appealing features compared with conventional representations: 1) It can model different curvatures with a fixed number of parameters, while polygon-points-based methods need to utilize a different number of points. 2) It can distinguish adjacent or overlapping texts as they have apparent different curve coefficients, while segmentation-based or points-based methods suffer from adhesive spatial positions. PBFormer combines the PB with the transformer, which can directly generate smooth text contours sampled from predicted curves without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Adam · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections · Residual Connection
