Large Scale Scene Text Verification with Guided Attention
Dafang He, Yeqing Li, Alexander Gorban, Derrall Heath, Julian Ibarz,, Qian Yu, Daniel Kifer, C. Lee Giles

TL;DR
This paper introduces a novel end-to-end framework for scene text verification that does not require explicit text detection or recognition, enabling effective text presence determination in images with weak supervision.
Contribution
The work presents the first end-to-end scene text verification framework that learns text-image relationships without bounding box annotations, and introduces the Guided Attention model for real-world applications.
Findings
Outperforms state-of-the-art scene text reading solutions on a challenging Street View Business Matching dataset.
Successfully handles weakly labeled data without explicit scene text detection or recognition.
Provides a new perspective for studying scene text problems through a real-world verification task.
Abstract
Many tasks are related to determining if a particular text string exists in an image. In this work, we propose a new framework that learns this task in an end-to-end way. The framework takes an image and a text string as input and then outputs the probability of the text string being present in the image. This is the first end-to-end framework that learns such relationships between text and images in scene text area. The framework does not require explicit scene text detection or recognition and thus no bounding box annotations are needed for it. It is also the first work in scene text area that tackles suh a weakly labeled problem. Based on this framework, we developed a model called Guided Attention. Our designed model achieves much better results than several state-of-the-art scene text reading based solutions for a challenging Street View Business Matching task. The task tries to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
