FlowComposer: Composable Flows for Compositional Zero-Shot Learning
Zhenqi He, Lin Li, Long Chen

TL;DR
FlowComposer introduces a flow-based framework for compositional zero-shot learning, explicitly modeling attribute-object compositions in the embedding space to improve generalization over prior PEFT-based methods.
Contribution
It is the first to systematically apply flow matching to CZSL, explicitly constructing compositions and addressing feature entanglement issues.
Findings
Achieves significant improvements on three CZSL benchmarks.
Effectively disentangles attribute and object features.
Enhances generalization in zero-shot composition recognition.
Abstract
Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by recombining primitives learned from seen pairs. Recent CZSL methods built on vision-language models (VLMs) typically adopt parameter-efficient fine-tuning (PEFT). They apply visual disentanglers for decomposition and manipulate token-level prompts or prefixes to encode compositions. However, such PEFT-based designs suffer from two fundamental limitations: (1) Implicit Composition Construction, where composition is realized only via token concatenation or branch-wise prompt tuning rather than an explicit operation in the embedding space; (2) Remained Feature Entanglement, where imperfect disentanglement leaves attribute, object, and composition features mutually contaminated. Together, these issues limit the generalization ability of current CZSL models. In this paper, we are the first to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
