Know2Look: Commonsense Knowledge for Visual Search

Sreyasi Nag Chowdhury; Niket Tandon; Gerhard Weikum

arXiv:1909.00749·cs.IR·September 4, 2019

Know2Look: Commonsense Knowledge for Visual Search

Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum

PDF

Open Access

TL;DR

This paper introduces Know2Look, a method that enhances visual search by integrating background commonsense knowledge with text and visual cues, aiming to improve document retrieval accuracy.

Contribution

It proposes a novel multi-modal approach combining text, visual cues, and commonsense knowledge to improve image-based search and retrieval.

Findings

01

Improved retrieval accuracy with the integration of commonsense knowledge.

02

Effective combination of text, visual cues, and background knowledge.

03

Enhanced search results over traditional text-only methods.

Abstract

With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications