Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Kaidi Cao; Colin Wei; Adrien Gaidon; Nikos Arechiga; Tengyu Ma

arXiv:1906.07413·cs.LG·October 29, 2019·226 cites

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma

PDF

Open Access 5 Repos 1 Datasets

TL;DR

This paper introduces a theoretically grounded margin loss and a training schedule to improve deep learning performance on imbalanced datasets, demonstrating significant gains on benchmarks like iNaturalist 2018.

Contribution

The paper presents a novel label-distribution-aware margin loss and a deferred re-weighting training schedule for better handling class imbalance in deep learning.

Findings

01

The LDAM loss improves generalization on minority classes.

02

Deferred re-weighting enhances training stability and accuracy.

03

Combined methods outperform existing techniques on benchmarks.

Abstract

Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Beothuk/cifar10-lt-federated
dataset· 12 dl
12 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications