Scene Text Recognition with Semantics

Joshua Cesare Placidi; Yishu Miao; Zixu Wang; Lucia Specia

arXiv:2210.10836·cs.CV·October 21, 2022

Scene Text Recognition with Semantics

Joshua Cesare Placidi, Yishu Miao, Zixu Wang, Lucia Specia

PDF

Open Access

TL;DR

This paper introduces a multimodal scene text recognition approach that incorporates semantic scene context via object tags into a transformer model, improving accuracy especially on noisy or obscured text images.

Contribution

It presents a novel method that fuses semantic scene information with visual data in a transformer architecture for enhanced scene text recognition.

Findings

01

Higher performance on noisy text images

02

Effective integration of semantic scene context

03

Outperforms traditional models on benchmark datasets

Abstract

Scene Text Recognition (STR) models have achieved high performance in recent years on benchmark datasets where text images are presented with minimal noise. Traditional STR recognition pipelines take a cropped image as sole input and attempt to identify the characters present. This infrastructure can fail in instances where the input image is noisy or the text is partially obscured. This paper proposes using semantic information from the greater scene to contextualise predictions. We generate semantic vectors using object tags and fuse this information into a transformer-based architecture. The results demonstrate that our multimodal approach yields higher performance than traditional benchmark models, particularly on noisy instances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Text and Document Classification Technologies