Dynamically Visual Disambiguation of Keyword-based Image Search
Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Limin Wang, Fan Zhu,, Lizhong Ding, Gangshan Wu, Ling Shao

TL;DR
This paper introduces an adaptive multi-model framework that dynamically disambiguates keywords in image search results by employing saliency-guided deep learning, improving the accuracy of web-based image retrieval.
Contribution
It proposes a novel adaptive framework combining dynamic query selection and saliency-guided deep learning for visual disambiguation in keyword-based image search.
Findings
Outperforms existing methods in visual disambiguation tasks
Effectively adapts to dynamic changes in search results
Demonstrates superior accuracy in extensive experiments
Abstract
Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits their performance is the problem of visual polysemy. To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation. Compared to existing methods, the primary advantage of our approach lies in that our approach can adapt to the dynamic changes in the search results. Our proposed framework consists of two major steps: we first discover and dynamically select the text queries according to the image search results, then we employ the proposed saliency-guided deep multi-instance learning network to remove outliers and learn classification models for visual disambiguation. Extensive experiments demonstrate the superiority of our proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
