Semi-Supervised Learning of Class Balance under Class-Prior Change by   Distribution Matching

Marthinus Du Plessis (Tokyo Institute of Technology); Masashi Sugiyama; (Tokyo Institute of Technology)

arXiv:1206.4677·cs.LG·June 22, 2012·27 cites

Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching

Marthinus Du Plessis (Tokyo Institute of Technology), Masashi Sugiyama, (Tokyo Institute of Technology)

PDF

Open Access

TL;DR

This paper introduces a method to estimate class ratios in test datasets with unknown class balance by matching input data distributions, addressing bias issues in real-world classification tasks.

Contribution

It proposes a novel distribution matching approach for class ratio estimation under class-prior change without requiring labeled test data.

Findings

01

Effective class ratio estimation demonstrated in experiments

02

Reduces bias caused by class imbalance in test data

03

Applicable to real-world classification scenarios

Abstract

In real-world classification problems, the class balance in the training dataset does not necessarily reflect that of the test dataset, which can cause significant estimation bias. If the class ratio of the test dataset is known, instance re-weighting or resampling allows systematical bias correction. However, learning the class ratio of the test dataset is challenging when no labeled data is available from the test domain. In this paper, we propose to estimate the class ratio in the test dataset by matching probability distributions of training and test input data. We demonstrate the utility of the proposed approach through experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications