Semantically Guided Dynamic Visual Prototype Refinement for Compositional Zero-Shot Learning

Zhong Peng; Yishi Xu; Gerong Wang; Wenchao Chen; Bo Chen; Jing Zhang; Hongwei Liu

arXiv:2501.07114·cs.CV·February 3, 2026

Semantically Guided Dynamic Visual Prototype Refinement for Compositional Zero-Shot Learning

Zhong Peng, Yishi Xu, Gerong Wang, Wenchao Chen, Bo Chen, Jing Zhang, Hongwei Liu

PDF

TL;DR

Duplex is a novel framework for compositional zero-shot learning that dynamically refines visual prototypes using local graph aggregation, improving discrimination and reducing bias in recognizing unseen state-object pairs.

Contribution

It introduces a dual-prototype approach with dynamic local-graph refinement, enhancing visual discrimination and semantic structure preservation in CZSL.

Findings

01

Achieves competitive performance on MIT-States, UT-Zappos, and CGQA datasets.

02

Effectively reduces seen bias and improves generalization to unseen pairs.

03

Enriches class prototypes with fine-grained visual evidence.

Abstract

Compositional Zero-Shot Learning (CZSL) seeks to recognize unseen state-object pairs by recombining primitives learned from seen compositions. Despite recent progress with vision-language models (VLMs), two limitations remain: (i) text-driven semantic prototypes are weakly discriminative in the visual feature space; and (ii) unseen pairs are optimized passively, thereby inducing seen bias. To address these limitations, we present Duplex, a framework that couples dual-prototype learning with dynamic local-graph refinement of visual prototypes. For each composition, Duplex maintains a semantic prototype via prompt learning and a visual prototype for unseen pairs constructed by recombining disentangled state and object primitives from seen images. The visual prototypes are updated dynamically through lightweight aggregation on mini-batch local graphs, which incorporates unseen compositions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN · Graph Neural Network · Focus