VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning
Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

TL;DR
This paper introduces a method to generate visually-grounded semantic embeddings for zero-shot learning that do not require human annotation, improving the alignment between semantic and visual similarities and enhancing zero-shot performance.
Contribution
The authors propose a novel unsupervised approach to create semantic embeddings that incorporate visual properties, outperforming traditional word embeddings in zero-shot learning tasks.
Findings
Our embeddings better reflect visual similarities of classes.
The method improves zero-shot learning accuracy across three benchmarks.
Visual clustering enhances semantic representations for unseen classes.
Abstract
Human-annotated attributes serve as powerful semantic embeddings in zero-shot learning. However, their annotation process is labor-intensive and needs expert supervision. Current unsupervised semantic embeddings, i.e., word embeddings, enable knowledge transfer between classes. However, word embeddings do not always reflect visual similarities and result in inferior zero-shot performance. We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity, and further imposes their class discrimination and semantic relatedness. To associate these clusters with previously unseen classes, we use external knowledge, e.g., word embeddings and propose a novel class relation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Viral Infections and Outbreaks Research
