Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels
Erik Englesson, Hossein Azizpour

TL;DR
This paper introduces a generalized Jensen-Shannon divergence loss that interpolates between cross entropy and mean absolute error, improving learning robustness with noisy labels and achieving state-of-the-art results on noisy datasets.
Contribution
It proposes a novel generalized Jensen-Shannon divergence loss that enhances robustness to noisy labels by encouraging consistency around data points.
Findings
Achieves state-of-the-art results on CIFAR with synthetic noise.
Outperforms existing methods on WebVision with real-world noise.
Demonstrates improved robustness across varying noise rates.
Abstract
Prior works have found it beneficial to combine provably noise-robust loss functions e.g., mean absolute error (MAE) with standard categorical loss function e.g. cross entropy (CE) to improve their learnability. Here, we propose to use Jensen-Shannon divergence as a noise-robust loss function and show that it interestingly interpolate between CE and MAE with a controllable mixing parameter. Furthermore, we make a crucial observation that CE exhibit lower consistency around noisy data points. Based on this observation, we adopt a generalized version of the Jensen-Shannon divergence for multiple distributions to encourage consistency around data points. Using this loss function, we show state-of-the-art results on both synthetic (CIFAR), and real-world (e.g., WebVision) noise with varying noise rates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Statistical Methods and Models · Imbalanced Data Classification Techniques
