VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao

TL;DR
This paper introduces VL-LTR, a visual-linguistic framework that leverages text descriptions to enhance long-tailed visual recognition, significantly improving accuracy especially for classes with few samples.
Contribution
The paper proposes a novel visual-linguistic approach for long-tailed recognition, utilizing noisy class-level text descriptions to improve visual recognition performance.
Findings
Achieves 77.2% accuracy on ImageNet-LT, surpassing previous methods by over 17 points.
Effectively learns visual and linguistic representations from noisy internet data.
Sets new state-of-the-art results on long-tailed recognition benchmarks.
Abstract
Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the image modality. In this work, we present a visual-linguistic long-tailed recognition framework, termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition (LTR). Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
