Biquality Learning: a Framework to Design Algorithms Dealing with   Closed-Set Distribution Shifts

Pierre Nodet; Vincent Lemaire; Alexis Bondu; Antoine; Cornu\'ejols

arXiv:2308.15132·cs.LG·August 30, 2023

Biquality Learning: a Framework to Design Algorithms Dealing with Closed-Set Distribution Shifts

Pierre Nodet, Vincent Lemaire, Alexis Bondu, Antoine, Cornu\'ejols

PDF

Open Access 1 Repo

TL;DR

This paper introduces a biquality learning framework that leverages both trusted and untrusted datasets to develop algorithms capable of handling complex distribution shifts in machine learning.

Contribution

It proposes two novel methods inspired by label noise and covariate shift literature, addressing distributional shifts in biquality data settings.

Findings

01

Two new methods for biquality learning tested on real datasets

02

Synthetic concept drift and class-conditional shifts introduced

03

Biquality learning algorithms show promise for handling distribution shifts

Abstract

Training machine learning models from data with weak supervision and dataset shifts is still challenging. Designing algorithms when these two situations arise has not been explored much, and existing algorithms cannot always handle the most complex distributional shifts. We think the biquality data setup is a suitable framework for designing such algorithms. Biquality Learning assumes that two datasets are available at training time: a trusted dataset sampled from the distribution of interest and the untrusted dataset with dataset shifts and weaknesses of supervision (aka distribution shifts). The trusted and untrusted datasets available at training time make designing algorithms dealing with any distribution shifts possible. We propose two methods, one inspired by the label noise literature and another by the covariate shift literature for biquality learning. We experiment with two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pierrenodet/blds
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Time Series Analysis and Forecasting · Machine Learning and Data Classification