VISIR: Visual and Semantic Image Label Refinement
Sreyasi Nag Chowdhury, Niket Tandon, Hakan Ferhatosmanoglu, Gerhard, Weikum

TL;DR
VISIR enhances image label quality by semantically refining and expanding labels from object detection, leveraging lexical and commonsense knowledge through an optimization approach, thereby improving state-of-the-art visual labeling tools.
Contribution
The paper introduces VISIR, a novel method that refines and expands image labels using semantic coherence and knowledge bases, addressing limitations of existing tagging methods.
Findings
VISIR improves label quality over LSDA and YOLO.
Semantic refinement enhances image retrieval accuracy.
Optimization-based approach effectively integrates multiple knowledge sources.
Abstract
The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more informative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on generalizations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semantically refining and expanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
