Biquality Learning: a Framework to Design Algorithms Dealing with Closed-Set Distribution Shifts
Pierre Nodet, Vincent Lemaire, Alexis Bondu, Antoine, Cornu\'ejols

TL;DR
This paper introduces a biquality learning framework that leverages both trusted and untrusted datasets to develop algorithms capable of handling complex distribution shifts in machine learning.
Contribution
It proposes two novel methods inspired by label noise and covariate shift literature, addressing distributional shifts in biquality data settings.
Findings
Two new methods for biquality learning tested on real datasets
Synthetic concept drift and class-conditional shifts introduced
Biquality learning algorithms show promise for handling distribution shifts
Abstract
Training machine learning models from data with weak supervision and dataset shifts is still challenging. Designing algorithms when these two situations arise has not been explored much, and existing algorithms cannot always handle the most complex distributional shifts. We think the biquality data setup is a suitable framework for designing such algorithms. Biquality Learning assumes that two datasets are available at training time: a trusted dataset sampled from the distribution of interest and the untrusted dataset with dataset shifts and weaknesses of supervision (aka distribution shifts). The trusted and untrusted datasets available at training time make designing algorithms dealing with any distribution shifts possible. We propose two methods, one inspired by the label noise literature and another by the covariate shift literature for biquality learning. We experiment with two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Time Series Analysis and Forecasting · Machine Learning and Data Classification
