Why Can Accurate Models Be Learned from Inaccurate Annotations?

Chongjie Si; Yidan Cui; Fuchao Yang; Xiaokang Yang; Wei Shen

arXiv:2505.16159·cs.LG·May 23, 2025

Why Can Accurate Models Be Learned from Inaccurate Annotations?

Chongjie Si, Yidan Cui, Fuchao Yang, Xiaokang Yang, Wei Shen

PDF

Open Access

TL;DR

This paper investigates why models trained on noisy, inaccurate labels can still make accurate predictions, revealing that principal subspace alignment explains robustness and proposing a method to improve it.

Contribution

The paper provides a theoretical analysis of label noise effects on model weights and introduces LIP, a plug-in to preserve principal subspace information under label inaccuracies.

Findings

01

Principal subspace remains largely aligned despite label noise

02

Angles of principal subspaces show minimal deviation with moderate noise

03

LIP improves model robustness across various noise conditions

Abstract

Learning from inaccurate annotations has gained significant attention due to the high cost of precise labeling. However, despite the presence of erroneous labels, models trained on noisy data often retain the ability to make accurate predictions. This intriguing phenomenon raises a fundamental yet largely unexplored question: why models can still extract correct label information from inaccurate annotations remains unexplored. In this paper, we conduct a comprehensive investigation into this issue. By analyzing weight matrices from both empirical and theoretical perspectives, we find that label inaccuracy primarily accumulates noise in lower singular components and subtly perturbs the principal subspace. Within a certain range, the principal subspaces of weights trained on inaccurate labels remain largely aligned with those learned from clean labels, preserving essential task-relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Imbalanced Data Classification Techniques

MethodsSoftmax · Attention Is All You Need