Hierarchical Visual Primitive Experts for Compositional Zero-Shot   Learning

Hanjae Kim; Jiyoung Lee; Seongheon Park; Kwanghoon Sohn

arXiv:2308.04016·cs.CV·August 9, 2023

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning

Hanjae Kim, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn

PDF

Open Access 1 Repo

TL;DR

This paper introduces Composition Transformer (CoT), a hierarchical framework for compositional zero-shot learning that explicitly models attribute-object contextuality and addresses data imbalance, achieving state-of-the-art results.

Contribution

The paper proposes a novel hierarchical framework with object and attribute experts, and a minority attribute augmentation method for improved compositional zero-shot learning.

Findings

01

Achieves state-of-the-art performance on MIT-States, C-GQA, and VAW-CZSL benchmarks.

02

Effectively models contextuality between attributes and objects.

03

Addresses data imbalance with virtual sample augmentation.

Abstract

Compositional zero-shot learning (CZSL) aims to recognize unseen compositions with prior knowledge of known primitives (attribute and object). Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data. We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. CoT employs object and attribute experts in distinctive manners to generate representative embeddings, using the visual network hierarchically. The object expert extracts representative object embeddings from the final layer in a bottom-up manner, while the attribute expert makes attribute embeddings in a top-down manner with a proposed object-guided attention module that models contextuality explicitly. To remedy biased…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hanjaekim98/cot
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Dental Research and COVID-19

MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding