TL;DR
This paper introduces a novel multi-modal triplet loss and a disentangling class representation GAN to improve zero-shot learning by better disentangling and synthesizing visual features for both seen and unseen classes.
Contribution
It proposes a multi-modal triplet loss and a disentangling GAN framework to enhance feature disentanglement and synthesis in zero-shot learning.
Findings
Achieves superior performance on four benchmark datasets.
Effectively disentangles class representations for better generalization.
Outperforms state-of-the-art methods in ZSL tasks.
Abstract
Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multimodal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTriplet Loss
