Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

Yuxia Geng; Runkai Zhu; Jiaoyan Chen; Jintai Chen; Xiang Chen; Zhuo Chen; Shuofei Qiao; Yuxiang Wang; Xiaoliang Xu; Sheng-Jun Huang

arXiv:2408.09786·cs.CV·June 2, 2025

Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

Yuxia Geng, Runkai Zhu, Jiaoyan Chen, Jintai Chen, Xiang Chen, Zhuo Chen, Shuofei Qiao, Yuxiang Wang, Xiaoliang Xu, Sheng-Jun Huang

PDF

Open Access

TL;DR

This paper introduces a novel graph-guided cross-composition feature disentanglement method for compositional zero-shot learning, leveraging a large pre-trained vision-language model to improve generalization across compositions.

Contribution

It proposes a cross-composition disentanglement approach with a compositional graph and adapters in CLIP, enhancing primitive feature generalization in CZSL.

Findings

01

Significant performance improvements on three CZSL benchmarks.

02

Effective disentanglement of primitive features across compositions.

03

Validation of components through ablation studies.

Abstract

Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL). However, due to the feature divergence of an attribute (resp. object) when combined with different objects (resp. attributes), it is challenging to learn disentangled primitive features that are general across different compositions. To this end, we propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions. More specifically, we leverage a compositional graph to define the overall primitive-sharing relationships between compositions, and build a task-specific architecture upon the recently successful large pre-trained vision-language model (VLM) CLIP, with dual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training