Binary Quantification and Dataset Shift: An Experimental Investigation

Pablo Gonz\'alez; Alejandro Moreo; Fabrizio Sebastiani

arXiv:2310.04565·cs.LG·October 10, 2023

Binary Quantification and Dataset Shift: An Experimental Investigation

Pablo Gonz\'alez, Alejandro Moreo, Fabrizio Sebastiani

PDF

Open Access 1 Repo

TL;DR

This paper investigates how current quantification algorithms perform under various types of dataset shift, revealing limitations and the need for more robust methods, through an extensive experimental analysis and a detailed taxonomy of dataset shifts.

Contribution

It introduces a fine-grained taxonomy of dataset shift types, establishes protocols for dataset generation under these shifts, and evaluates existing quantification methods across these scenarios.

Findings

01

Many methods robust to prior probability shift fail under other shifts

02

No current method is robust to all simulated dataset shifts

03

Quantification methods need development for broader applicability

Abstract

Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the unlabelled data are not IID, i.e., suffer from dataset shift. To date, quantification methods have mostly been tested only on a special case of dataset shift, i.e., prior probability shift; the relationship between quantification and other types of dataset shift remains, by and large, unexplored. In this work we carry out an experimental analysis of how current quantification algorithms behave under different types of dataset shift, in order to identify limitations of current approaches and hopefully pave the way for the development of more broadly applicable methods. We do this by proposing a fine-grained taxonomy of types of dataset shift, by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pglez82/quant_datasetshift
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications