Learning from Imperfect Text Guidance: Robust Long-Tail Visual Recognition with High-Noise Label

Mengke Li; Haiquan Ling; Yiqun Zhang; Yang Lu; Hui Huang

arXiv:2604.23125·cs.CV·April 28, 2026

Learning from Imperfect Text Guidance: Robust Long-Tail Visual Recognition with High-Noise Label

Mengke Li, Haiquan Ling, Yiqun Zhang, Yang Lu, Hui Huang

PDF

1 Repo

TL;DR

This paper introduces a novel method leveraging pre-trained visual-language models to correct label-image mismatches in long-tailed, noisy datasets, significantly improving deep model robustness.

Contribution

It proposes Weak Teacher Supervision (WTS), utilizing cross-modal alignment to address label noise and distribution biases in long-tailed visual recognition.

Findings

01

WTS outperforms existing methods on synthetic and real-world datasets.

02

WTS maintains robustness under high-noise label conditions.

03

The approach effectively corrects label-image mismatches using auxiliary text information.

Abstract

Real-world data often exhibit long-tailed distributions with numerous noisy labels, substantially degrading the performance of deep models. While prior research has made progress in addressing this combined challenge, it overlooks the severe label-image mismatch inherent to high-noise settings, thereby limiting their effectiveness. Given that observed labels, though mismatched with images, still retain category information, we propose employing auxiliary text information from labels to address label-image inconsistencies in long-tailed noisy data. Specifically, we leverage the intrinsic cross-modal alignment in pre-trained visual-language models to correct the label-image inconsistencies. This supervisory signal, referred to as Weak Teacher Supervision (WTS), is unaffected by label noise and data distribution biases, albeit exhibits limited accuracy. Therefore, the activation of WTS is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/WTS-0F3C
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.