Corruptions of Supervised Learning Problems: Typology and Mitigations

Laura Iacovissi; Nan Lu; Robert C. Williamson

arXiv:2307.08643·cs.LG·May 19, 2026

Corruptions of Supervised Learning Problems: Typology and Mitigations

Laura Iacovissi, Nan Lu, Robert C. Williamson

PDF

TL;DR

This paper develops a comprehensive theory of data corruption in supervised learning, unifying models, analyzing impacts on learning, and proposing generalized mitigation strategies for various corruption types.

Contribution

It introduces a unified corruption framework, analyzes effects on learning risks, and extends mitigation methods to handle diverse corruption scenarios.

Findings

01

Unified corruption model distinguishes different corruption types.

02

Corruption impacts Bayes risk differently depending on type.

03

Generalized loss correction methods for attribute and joint corruptions.

Abstract

Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unified view of corruption modelization and mitigation. In this work, we develop a general theory of corruption, which incorporates all modifications to a supervised learning problem, including changes in model class and loss. Focusing on changes to the underlying probability distributions via Markov kernels, our approach leads to three novel opportunities. First, it enables the construction of a novel, provably exhaustive corruption framework, distinguishing among different corruption types. This serves to unify existing models and establish a consistent nomenclature. Second, it facilitates a systematic analysis of corruption's consequences on learning tasks, by comparing Bayes risks in the clean and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Corruption and Economic Development

MethodsFocus