PARIS: Pruning Algorithm via the Representer theorem for Imbalanced Scenarios
Enrico Camporeale

TL;DR
PARIS introduces a novel neural network pruning method based on the representer theorem, effectively reducing training data size and improving rare-event regression performance without retraining.
Contribution
It proposes a closed-form residual for dataset pruning in neural networks, enabling efficient, principled removal of uninformative samples in imbalanced regression tasks.
Findings
Reduces training set size by up to 75%
Outperforms re-weighting and oversampling methods
Improves RMSE on real-world space weather data
Abstract
The challenge of \textbf{imbalanced regression} arises when standard Empirical Risk Minimization (ERM) biases models toward high-frequency regions of the data distribution, causing severe degradation on rare but high-impact ``tail'' events. Existing strategies uch as loss re-weighting or synthetic over-sampling often introduce noise, distort the underlying distribution, or add substantial algorithmic complexity. We introduce \textbf{PARIS} (Pruning Algorithm via the Representer theorem for Imbalanced Scenarios), a principled framework that mitigates imbalance by \emph{optimizing the training set itself}. PARIS leverages the representer theorem for neural networks to compute a \textbf{closed-form representer deletion residual}, which quantifies the exact change in validation loss caused by removing a single training point \emph{without retraining}. Combined with an efficient Cholesky…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques
