Class-Balanced Loss Based on Effective Number of Samples

Yin Cui; Menglin Jia; Tsung-Yi Lin; Yang Song; Serge Belongie

arXiv:1901.05555·cs.CV·January 18, 2019·129 cites

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie

PDF

Open Access 5 Repos

TL;DR

This paper introduces a novel class-balanced loss function based on the effective number of samples, which improves performance on long-tailed datasets by re-weighting classes according to a new theoretical measure.

Contribution

The paper proposes a new theoretical framework for measuring data overlap and defines the effective number of samples to enhance class re-balancing in long-tailed datasets.

Findings

01

Significant performance improvements on long-tailed CIFAR datasets.

02

Effective number-based re-weighting outperforms traditional methods.

03

Successful application to large-scale datasets like ImageNet and iNaturalist.

Abstract

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1 - β^{n}) / (1 - β)$ , where $n$ is the number of samples and $β \in [0, 1)$ is a hyperparameter. We design a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · COVID-19 diagnosis using AI