LARP: Learner-Agnostic Robust Data Prefiltering

Kristian Minchev; Dimitar Iliev Dimitrov; Nikola Konstantinov

arXiv:2506.20573·stat.ML·July 11, 2025

LARP: Learner-Agnostic Robust Data Prefiltering

Kristian Minchev, Dimitar Iliev Dimitrov, Nikola Konstantinov

PDF

Open Access

TL;DR

This paper introduces a formal framework for learner-agnostic robust data prefiltering (LARP) that aims to improve data quality for multiple downstream learners, balancing robustness and utility through theoretical analysis and experiments.

Contribution

It formalizes the LARP problem, analyzes its theoretical hardness, and evaluates the utility trade-offs of learner-agnostic versus learner-specific prefiltering methods.

Findings

01

Prefiltering for multiple learners can lead to utility loss compared to learner-specific approaches.

02

Theoretical hardness results highlight challenges in universal prefiltering.

03

Experimental results show statistically significant utility reduction with learner-agnostic prefiltering.

Abstract

The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which many learning procedures are sensitive. Therefore, the question of whether and how public datasets should be prefiltered to facilitate accurate downstream learning arises. On a technical level this requires the construction of principled data prefiltering methods which are learner-agnostic robust, in the sense of provably protecting a set of pre-specified downstream learners from corrupted data. In this work, we formalize the problem of Learner-Agnostic Robust data Prefiltering (LARP), which aims at finding prefiltering procedures that minimize a worst-case loss over a pre-specified set of learners. We first instantiate our framework in the context of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Machine Learning and Algorithms

MethodsSparse Evolutionary Training