LARP: Learner-Agnostic Robust Data Prefiltering
Kristian Minchev, Dimitar Iliev Dimitrov, Nikola Konstantinov

TL;DR
This paper introduces a formal framework for learner-agnostic robust data prefiltering (LARP) that aims to improve data quality for multiple downstream learners, balancing robustness and utility through theoretical analysis and experiments.
Contribution
It formalizes the LARP problem, analyzes its theoretical hardness, and evaluates the utility trade-offs of learner-agnostic versus learner-specific prefiltering methods.
Findings
Prefiltering for multiple learners can lead to utility loss compared to learner-specific approaches.
Theoretical hardness results highlight challenges in universal prefiltering.
Experimental results show statistically significant utility reduction with learner-agnostic prefiltering.
Abstract
The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which many learning procedures are sensitive. Therefore, the question of whether and how public datasets should be prefiltered to facilitate accurate downstream learning arises. On a technical level this requires the construction of principled data prefiltering methods which are learner-agnostic robust, in the sense of provably protecting a set of pre-specified downstream learners from corrupted data. In this work, we formalize the problem of Learner-Agnostic Robust data Prefiltering (LARP), which aims at finding prefiltering procedures that minimize a worst-case loss over a pre-specified set of learners. We first instantiate our framework in the context of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Machine Learning and Algorithms
MethodsSparse Evolutionary Training
