Rethinking Noise-Robust Training for Frozen Vision Foundation Models: A Cross-Dataset Benchmark with a Case Study of Small-Loss Failure
Zitong Li, Haoyu Wang

TL;DR
This paper benchmarks eight noisy-label learning methods on frozen vision foundation models across multiple medical datasets, revealing no universal best and highlighting the importance of regime-aware method selection.
Contribution
It provides a comprehensive benchmark and analysis of noisy-label methods for frozen VFMs, challenging the small-loss assumption and offering practical guidance for method choice.
Findings
ELR wins most conditions but no universal winner
Method cost increases sharply with noise severity
Prediction agreement is more stable than loss ranking under asymmetric noise
Abstract
Frozen Vision Foundation Models (VFMs) with lightweight classification heads are increasingly used in medical imaging because they offer efficient and reproducible deployment. Yet noisy-label learning methods for this frozen-feature regime remain poorly understood, and most existing methods still rely on a small-loss assumption inherited from end-to-end training. We present a controlled benchmark of eight noisy-label methods across five medical datasets, three backbones, two noise types, and five noise rates (150 conditions, 6,000 training runs), evaluated with balanced accuracy. The benchmark shows that there is no universal winner: Friedman ranking over the 150 conditions yields (), ELR wins the most conditions (49/150), while CUFIT attains the best mean rank (2.51). The practical cost of method choice grows sharply with noise severity, from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
