VISIR: Visual and Semantic Image Label Refinement

Sreyasi Nag Chowdhury; Niket Tandon; Hakan Ferhatosmanoglu; Gerhard; Weikum

arXiv:1909.00741·cs.MM·September 4, 2019

VISIR: Visual and Semantic Image Label Refinement

Sreyasi Nag Chowdhury, Niket Tandon, Hakan Ferhatosmanoglu, Gerhard, Weikum

PDF

TL;DR

VISIR enhances image label quality by semantically refining and expanding labels from object detection, leveraging lexical and commonsense knowledge through an optimization approach, thereby improving state-of-the-art visual labeling tools.

Contribution

The paper introduces VISIR, a novel method that refines and expands image labels using semantic coherence and knowledge bases, addressing limitations of existing tagging methods.

Findings

01

VISIR improves label quality over LSDA and YOLO.

02

Semantic refinement enhances image retrieval accuracy.

03

Optimization-based approach effectively integrates multiple knowledge sources.

Abstract

The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more informative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on generalizations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semantically refining and expanding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.