Know2Look: Commonsense Knowledge for Visual Search
Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum

TL;DR
This paper introduces Know2Look, a method that enhances visual search by integrating background commonsense knowledge with text and visual cues, aiming to improve document retrieval accuracy.
Contribution
It proposes a novel multi-modal approach combining text, visual cues, and commonsense knowledge to improve image-based search and retrieval.
Findings
Improved retrieval accuracy with the integration of commonsense knowledge.
Effective combination of text, visual cues, and background knowledge.
Enhanced search results over traditional text-only methods.
Abstract
With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
