Why Can Accurate Models Be Learned from Inaccurate Annotations?
Chongjie Si, Yidan Cui, Fuchao Yang, Xiaokang Yang, Wei Shen

TL;DR
This paper investigates why models trained on noisy, inaccurate labels can still make accurate predictions, revealing that principal subspace alignment explains robustness and proposing a method to improve it.
Contribution
The paper provides a theoretical analysis of label noise effects on model weights and introduces LIP, a plug-in to preserve principal subspace information under label inaccuracies.
Findings
Principal subspace remains largely aligned despite label noise
Angles of principal subspaces show minimal deviation with moderate noise
LIP improves model robustness across various noise conditions
Abstract
Learning from inaccurate annotations has gained significant attention due to the high cost of precise labeling. However, despite the presence of erroneous labels, models trained on noisy data often retain the ability to make accurate predictions. This intriguing phenomenon raises a fundamental yet largely unexplored question: why models can still extract correct label information from inaccurate annotations remains unexplored. In this paper, we conduct a comprehensive investigation into this issue. By analyzing weight matrices from both empirical and theoretical perspectives, we find that label inaccuracy primarily accumulates noise in lower singular components and subtly perturbs the principal subspace. Within a certain range, the principal subspaces of weights trained on inaccurate labels remain largely aligned with those learned from clean labels, preserving essential task-relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Imbalanced Data Classification Techniques
MethodsSoftmax · Attention Is All You Need
