AE TextSpotter: Learning Visual and Linguistic Representation for   Ambiguous Text Spotting

Wenhai Wang; Xuebo Liu; Xiaozhong Ji; Enze Xie; Ding Liang; Zhibo; Yang; Tong Lu; Chunhua Shen; Ping Luo

arXiv:2008.00714·cs.CV·July 7, 2021·1 cites

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo, Yang, Tong Lu, Chunhua Shen, Ping Luo

PDF

Open Access 2 Repos

TL;DR

AE TextSpotter introduces a novel approach that combines visual and linguistic features to improve scene text spotting accuracy, especially in ambiguous cases, outperforming existing methods significantly.

Contribution

This work is the first to incorporate a language model into text detection, reducing ambiguity and improving detection confidence in scene text spotting.

Findings

01

Outperforms state-of-the-art methods by over 4% on ambiguous samples

02

Learns linguistic and visual features jointly for better detection

03

Reduces false positives through a dedicated language module

Abstract

Scene text spotting aims to detect and recognize the entire word or sentence with multiple characters in natural images. It is still challenging because ambiguity often occurs when the spacing between characters is large or the characters are evenly spread in multiple rows and columns, making many visually plausible groupings of the characters (e.g. "BERLIN" is incorrectly detected as "BERL" and "IN" in Fig. 1(c)). Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection. The proposed AE TextSpotter has three important benefits. 1) The linguistic representation is learned together with the visual representation in a framework. To our knowledge, it is the first time to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAutoencoders