Efficiently Leveraging Linguistic Priors for Scene Text Spotting

Nguyen Nguyen; Yapeng Tian; Chenliang Xu

arXiv:2402.17134·cs.CV·February 28, 2024·2 cites

Efficiently Leveraging Linguistic Priors for Scene Text Spotting

Nguyen Nguyen, Yapeng Tian, Chenliang Xu

PDF

Open Access

TL;DR

This paper introduces a method that leverages linguistic priors from large text corpora to enhance scene text spotting and recognition, replacing traditional encoding, leading to improved accuracy and localization in a simple, integrable way.

Contribution

It proposes a novel approach to incorporate linguistic knowledge into scene text spotting models, replacing one-hot encoding with informative text distributions without in-domain fine-tuning.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Improves both recognition accuracy and word localization.

03

Easily integrates into existing auto-regressive models.

Abstract

Incorporating linguistic knowledge can improve scene text recognition, but it is questionable whether the same holds for scene text spotting, which typically involves text detection and recognition. This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models. This allows the model to capture the relationship between characters in the same word. Additionally, we introduce a technique to generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning. As a result, the newly created text distributions are more informative than pure one-hot encoding, leading to improved spotting and recognition performance. Our method is simple and efficient, and it can easily be integrated into existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications

MethodsALIGN