Learning to Predict Visual Attributes in the Wild
Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan, Tran, Abhinav Shrivastava

TL;DR
This paper introduces a large-scale in-the-wild visual attribute dataset and proposes advanced techniques to improve multi-label attribute prediction, significantly outperforming existing methods.
Contribution
The paper presents a new extensive dataset for visual attribute prediction and novel methods addressing challenges like label sparsity and data imbalance.
Findings
Achieved 3.7 mAP improvement over state-of-the-art methods.
Developed a multi-hop attention-based CNN model.
Introduced a supervised attribute-aware contrastive learning algorithm.
Abstract
Visual attributes constitute a large portion of information contained in a scene. Objects can be described using a wide variety of attributes which portray their visual appearance (color, texture), geometry (shape, size, posture), and other intrinsic properties (state, action). Existing work is mostly limited to study of attribute prediction in specific domains. In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances. Formally, object attribute prediction is a multi-label classification problem where all attributes that apply to an object must be predicted. Our dataset poses significant challenges to existing methods due to large number of attributes, label sparsity, data imbalance, and object occlusion. To this end, we propose several techniques that systematically tackle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
