Binary Quantification and Dataset Shift: An Experimental Investigation
Pablo Gonz\'alez, Alejandro Moreo, Fabrizio Sebastiani

TL;DR
This paper investigates how current quantification algorithms perform under various types of dataset shift, revealing limitations and the need for more robust methods, through an extensive experimental analysis and a detailed taxonomy of dataset shifts.
Contribution
It introduces a fine-grained taxonomy of dataset shift types, establishes protocols for dataset generation under these shifts, and evaluates existing quantification methods across these scenarios.
Findings
Many methods robust to prior probability shift fail under other shifts
No current method is robust to all simulated dataset shifts
Quantification methods need development for broader applicability
Abstract
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the unlabelled data are not IID, i.e., suffer from dataset shift. To date, quantification methods have mostly been tested only on a special case of dataset shift, i.e., prior probability shift; the relationship between quantification and other types of dataset shift remains, by and large, unexplored. In this work we carry out an experimental analysis of how current quantification algorithms behave under different types of dataset shift, in order to identify limitations of current approaches and hopefully pave the way for the development of more broadly applicable methods. We do this by proposing a fine-grained taxonomy of types of dataset shift, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications
