Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, Donglin, Wang

TL;DR
Troika introduces a multi-path approach with explicit modeling of state, object, and their composition in compositional zero-shot learning, leveraging cross-modal traction to improve generalization and performance on benchmarks.
Contribution
It proposes a novel multi-branch framework with cross-modal traction for better modeling of state-object compositions in CZSL, enhancing generalization to unseen combinations.
Findings
Significantly outperforms existing methods on benchmarks.
Effective in both closed-world and open-world settings.
Demonstrates the benefit of explicit state-object modeling.
Abstract
Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs. Relying on learning the joint representation of seen compositions, these methods ignore the explicit modeling of the state and object, thus limiting the exploitation of pre-trained knowledge and generalization to unseen compositions. With a particular focus on the universality of the solution, in this work, we propose a novel paradigm for CZSL models that establishes three identification branches (i.e., Multi-Path) to jointly model the state, object, and composition. The presented Troika is our implementation that aligns the branch-specific prompt representations with decomposed visual features. To calibrate the bias between semantically similar multi-modal representations, we further devise a Cross-Modal Traction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Interpreting and Communication in Healthcare
