Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild
Jianfeng Dong, Xirong Li, Duanqing Xu

TL;DR
This paper evaluates cross-media similarity methods for web image retrieval, revealing that current models excel mainly on visual-oriented queries, which are a small part of real-user queries, and introduces a simple yet effective text2image approach.
Contribution
It proposes a query categorization measure, connects query types to model performance, and introduces a novel text2image method for evaluating cross-media retrieval.
Findings
State-of-the-art models perform well mainly on visual-oriented queries.
The proposed text2image method outperforms recent deep learning approaches.
Visual-oriented queries constitute a small fraction of real-user queries.
Abstract
In order to retrieve unlabeled images by textual queries, cross-media similarity computation is a key ingredient. Although novel methods are continuously introduced, little has been done to evaluate these methods together with large-scale query log analysis. Consequently, how far have these methods brought us in answering real-user queries is unclear. Given baseline methods that compute cross-media similarity using relatively simple text/image matching, how much progress have advanced models made is also unclear. This paper takes a pragmatic approach to answering the two questions. Queries are automatically categorized according to the proposed query visualness measure, and later connected to the evaluation of multiple cross-media similarity models on three test sets. Such a connection reveals that the success of the state-of-the-art is mainly attributed to their good performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
