Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition
Siming Fu, Xiaoxuan He, Xinpeng Ding, Yuchen Cao, Hualiang Wang

TL;DR
This paper introduces a vision-language framework that uses uniformly distributed category prototypes on a hypersphere to improve long-tail recognition by balancing feature space distribution and enhancing class decision boundaries.
Contribution
The proposed method generates uniformly distributed category prototypes and incorporates mechanisms for irrelevant text filtering and attribute enhancement to address data imbalance in long-tail recognition.
Findings
Outperforms previous vision-language methods on long-tailed datasets.
Achieves state-of-the-art performance in long-tail recognition tasks.
Effectively balances feature space distribution and improves class boundary clarity.
Abstract
Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the long-tailed data distribution can corrupt the representation space, where the distance between head and tail categories is much larger than the distance between two tail categories. This uneven feature space distribution causes the model to exhibit unclear and inseparable decision boundaries on the uniformly distributed test set, which lowers its performance. To address these challenges, we propose the uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance. Especially, we generate a set of category prototypes uniformly distributed on a hypersphere. Category prototype-guided mechanism for image-text matching makes the features of different classes converge to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
MethodsFocus
