Data splitting improves statistical performance in overparametrized regimes
Nicole M\"ucke, Enrico Reiss, Jonas Rungenhagen, and Markus Klein

TL;DR
This paper demonstrates that data splitting acts as a regularizer in overparametrized ridgeless regression, enhancing statistical performance and computational efficiency across finite and infinite-dimensional settings.
Contribution
It introduces a unified framework showing data splitting's regularizing effect in overparametrized regimes, improving both accuracy and efficiency.
Findings
Data splitting improves statistical performance in overparametrized models.
Regularizing effect of data splitting reduces computational complexity.
Effect demonstrated in both finite and infinite-dimensional settings.
Abstract
While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparametrization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems
