Super-class guided Transformer for Zero-Shot Attribute Classification
Sehyung Kim, Chanhyeong Yang, Jihwan Park, Taehoon Song, Hyunwoo J., Kim

TL;DR
SugaFormer is a novel transformer-based framework that leverages super-classes and knowledge transfer strategies to improve zero-shot attribute classification's scalability and generalizability, achieving state-of-the-art results.
Contribution
The paper introduces SugaFormer, which uses super-classes for query reduction and multi-context decoding, along with knowledge transfer strategies for enhanced zero-shot attribute classification.
Findings
Achieves state-of-the-art performance on three benchmarks.
Effectively generalizes across datasets in zero-shot settings.
Improves scalability for large attribute sets.
Abstract
Attribute classification is crucial for identifying specific characteristics within image regions. Vision-Language Models (VLMs) have been effective in zero-shot tasks by leveraging their general knowledge from large-scale datasets. Recent studies demonstrate that transformer-based models with class-wise queries can effectively address zero-shot multi-label classification. However, poor utilization of the relationship between seen and unseen attributes makes the model lack generalizability. Additionally, attribute classification generally involves many attributes, making maintaining the model's scalability difficult. To address these issues, we propose Super-class guided transFormer (SugaFormer), a novel framework that leverages super-classes to enhance scalability and generalizability for zero-shot attribute classification. SugaFormer employs Super-class Query Initialization (SQI) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsOptical Systems and Laser Technology · Infrared Target Detection Methodologies
