Learning Graph Embeddings for Compositional Zero-shot Learning
Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata

TL;DR
This paper introduces a novel graph-based approach called Compositional Graph Embedding (CGE) for zero-shot learning of unseen visual concept compositions, outperforming existing methods on standard benchmarks.
Contribution
The paper proposes a new end-to-end graph formulation that models dependencies between visual primitives, enabling zero-shot generalization without external knowledge bases.
Findings
CGE outperforms state-of-the-art on MIT-States and UT-Zappos datasets.
Introduces a new benchmark based on GQA dataset.
Demonstrates effective knowledge transfer between seen and unseen compositions.
Abstract
In compositional zero-shot learning, the goal is to recognize unseen compositions (e.g. old dog) of observed visual primitives states (e.g. old, cute) and objects (e.g. car, dog) in the training set. This is challenging because the same state can for example alter the visual appearance of a dog drastically differently from a car. As a solution, we propose a novel graph formulation called Compositional Graph Embedding (CGE) that learns image features, compositional classifiers, and latent representations of visual primitives in an end-to-end manner. The key to our approach is exploiting the dependency between states, objects, and their compositions within a graph structure to enforce the relevant knowledge transfer from seen to unseen compositions. By learning a joint compatibility that encodes semantics between concepts, our model allows for generalization to unseen compositions without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
