PARIS: Pruning Algorithm via the Representer theorem for Imbalanced Scenarios

Enrico Camporeale

arXiv:2512.06950·stat.ML·December 9, 2025

PARIS: Pruning Algorithm via the Representer theorem for Imbalanced Scenarios

Enrico Camporeale

PDF

Open Access

TL;DR

PARIS introduces a novel neural network pruning method based on the representer theorem, effectively reducing training data size and improving rare-event regression performance without retraining.

Contribution

It proposes a closed-form residual for dataset pruning in neural networks, enabling efficient, principled removal of uninformative samples in imbalanced regression tasks.

Findings

01

Reduces training set size by up to 75%

02

Outperforms re-weighting and oversampling methods

03

Improves RMSE on real-world space weather data

Abstract

The challenge of \textbf{imbalanced regression} arises when standard Empirical Risk Minimization (ERM) biases models toward high-frequency regions of the data distribution, causing severe degradation on rare but high-impact ``tail'' events. Existing strategies uch as loss re-weighting or synthetic over-sampling often introduce noise, distort the underlying distribution, or add substantial algorithmic complexity. We introduce \textbf{PARIS} (Pruning Algorithm via the Representer theorem for Imbalanced Scenarios), a principled framework that mitigates imbalance by \emph{optimizing the training set itself}. PARIS leverages the representer theorem for neural networks to compute a \textbf{closed-form representer deletion residual}, which quantifies the exact change in validation loss caused by removing a single training point \emph{without retraining}. Combined with an efficient Cholesky…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques