Automatic Visual Theme Discovery from Joint Image and Text Corpora
Ke Sun, Xianxu Hou, Qian Zhang, Guoping Qiu

TL;DR
This paper introduces an unsupervised framework for discovering visual themes from image and text data, improving semantic image understanding by clustering tags based on visual and semantic similarities, and demonstrating superior performance in image search and labeling tasks.
Contribution
The paper proposes a novel unsupervised method combining visual and semantic similarities to discover compact visual themes, outperforming traditional tag-based approaches.
Findings
Visual themes outperform tags in semantic image understanding
The framework achieves state-of-the-art results in image search and labeling
User studies confirm the rationality of the discovered themes
Abstract
A popular approach to semantic image understanding is to manually tag images with keywords and then learn a mapping from vi- sual features to keywords. Manually tagging images is a subjective pro- cess and the same or very similar visual contents are often tagged with different keywords. Furthermore, not all tags have the same descriptive power for visual contents and large vocabulary available from natural language could result in a very diverse set of keywords. In this paper, we propose an unsupervised visual theme discovery framework as a better (more compact, efficient and effective) alternative to semantic represen- tation of visual contents. We first show that tag based annotation lacks consistency and compactness for describing visually similar contents. We then learn the visual similarity between tags based on the visual features of the images containing the tags. At the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
