Deep Neural Network for Semantic-based Text Recognition in Images
Yi Zheng, Qitong Wang, Margrit Betke

TL;DR
This paper introduces a semantic-based text recognition model that leverages context understanding to improve accuracy in reading text from images, outperforming traditional single-word recognition methods on diverse datasets.
Contribution
The paper presents a novel deep learning framework combining text grouping, recognition, and semantic correction for improved scene text recognition.
Findings
Achieved 90% accuracy on catalog images.
Achieved 71% accuracy on protest images.
Outperforms baseline single-word recognition methods.
Abstract
State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel "semantic-based text recognition" (STR) deep learning model that reads text in images with the help of understanding context. STR consists of several modules. We introduce the Text Grouping and Arranging (TGA) algorithm to connect and order isolated text regions. A text-recognition network interprets isolated words. Benefiting from semantic information, a sequenceto-sequence network model efficiently corrects inaccurate and uncertain phrases produced earlier in the STR pipeline. We present experiments on two new distinct datasets that contain scanned catalog images of interior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Natural Language Processing Techniques
