Semantically Guided Dynamic Visual Prototype Refinement for Compositional Zero-Shot Learning
Zhong Peng, Yishi Xu, Gerong Wang, Wenchao Chen, Bo Chen, Jing Zhang, Hongwei Liu

TL;DR
Duplex is a novel framework for compositional zero-shot learning that dynamically refines visual prototypes using local graph aggregation, improving discrimination and reducing bias in recognizing unseen state-object pairs.
Contribution
It introduces a dual-prototype approach with dynamic local-graph refinement, enhancing visual discrimination and semantic structure preservation in CZSL.
Findings
Achieves competitive performance on MIT-States, UT-Zappos, and CGQA datasets.
Effectively reduces seen bias and improves generalization to unseen pairs.
Enriches class prototypes with fine-grained visual evidence.
Abstract
Compositional Zero-Shot Learning (CZSL) seeks to recognize unseen state-object pairs by recombining primitives learned from seen compositions. Despite recent progress with vision-language models (VLMs), two limitations remain: (i) text-driven semantic prototypes are weakly discriminative in the visual feature space; and (ii) unseen pairs are optimized passively, thereby inducing seen bias. To address these limitations, we present Duplex, a framework that couples dual-prototype learning with dynamic local-graph refinement of visual prototypes. For each composition, Duplex maintains a semantic prototype via prompt learning and a visual prototype for unseen pairs constructed by recombining disentangled state and object primitives from seen images. The visual prototypes are updated dynamically through lightweight aggregation on mini-batch local graphs, which incorporates unseen compositions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN · Graph Neural Network · Focus
