Scene Text Retrieval via Joint Text Detection and Similarity Learning
Hao Wang, Xiang Bai, Mingkun Yang, Shenggao Zhu, Jing Wang, Wenyu Liu

TL;DR
This paper introduces an end-to-end trainable framework that jointly learns scene text detection and cross-modal similarity to improve scene text retrieval accuracy, outperforming existing methods on benchmark datasets.
Contribution
It proposes a novel joint detection and similarity learning network for scene text retrieval, integrating both tasks into a single trainable system for better performance.
Findings
Outperforms state-of-the-art scene text retrieval methods
Achieves significantly better results than separated detection and retrieval approaches
Demonstrates robustness across three benchmark datasets
Abstract
Scene text retrieval aims to localize and search all text instances from an image gallery, which are the same or similar to a given query text. Such a task is usually realized by matching a query text to the recognized words, outputted by an end-to-end scene text spotter. In this paper, we address this problem by directly learning a cross-modal similarity between a query text and each text instance from natural images. Specifically, we establish an end-to-end trainable network, jointly optimizing the procedures of scene text detection and cross-modal similarity learning. In this way, scene text retrieval can be simply performed by ranking the detected text instances with the learned similarity. Experiments on three benchmark datasets demonstrate our method consistently outperforms the state-of-the-art scene text spotting/retrieval approaches. In particular, the proposed framework of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
