Corruptions of Supervised Learning Problems: Typology and Mitigations
Laura Iacovissi, Nan Lu, Robert C. Williamson

TL;DR
This paper develops a comprehensive theory of data corruption in supervised learning, unifying models, analyzing impacts on learning, and proposing generalized mitigation strategies for various corruption types.
Contribution
It introduces a unified corruption framework, analyzes effects on learning risks, and extends mitigation methods to handle diverse corruption scenarios.
Findings
Unified corruption model distinguishes different corruption types.
Corruption impacts Bayes risk differently depending on type.
Generalized loss correction methods for attribute and joint corruptions.
Abstract
Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unified view of corruption modelization and mitigation. In this work, we develop a general theory of corruption, which incorporates all modifications to a supervised learning problem, including changes in model class and loss. Focusing on changes to the underlying probability distributions via Markov kernels, our approach leads to three novel opportunities. First, it enables the construction of a novel, provably exhaustive corruption framework, distinguishing among different corruption types. This serves to unify existing models and establish a consistent nomenclature. Second, it facilitates a systematic analysis of corruption's consequences on learning tasks, by comparing Bayes risks in the clean and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Corruption and Economic Development
MethodsFocus
