VL-LTR: Learning Class-wise Visual-Linguistic Representation for   Long-Tailed Visual Recognition

Changyao Tian; Wenhai Wang; Xizhou Zhu; Jifeng Dai; Yu Qiao

arXiv:2111.13579·cs.CV·July 20, 2022·1 cites

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao

PDF

Open Access 1 Repo

TL;DR

This paper introduces VL-LTR, a visual-linguistic framework that leverages text descriptions to enhance long-tailed visual recognition, significantly improving accuracy especially for classes with few samples.

Contribution

The paper proposes a novel visual-linguistic approach for long-tailed recognition, utilizing noisy class-level text descriptions to improve visual recognition performance.

Findings

01

Achieves 77.2% accuracy on ImageNet-LT, surpassing previous methods by over 17 points.

02

Effectively learns visual and linguistic representations from noisy internet data.

03

Sets new state-of-the-art results on long-tailed recognition benchmarks.

Abstract

Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the image modality. In this work, we present a visual-linguistic long-tailed recognition framework, termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition (LTR). Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ChangyaoTian/VL-LTR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques