Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge   Ensembles

Benjamin S. Ruben; Cengiz Pehlevan

arXiv:2307.03176·stat.ML·January 11, 2024

Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles

Benjamin S. Ruben, Cengiz Pehlevan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper develops a theoretical framework for feature bagging in noisy ridge ensembles, showing how subsampling affects learning curves and proposing heterogeneous ensembling to mitigate double-descent in high-dimensional settings.

Contribution

It introduces a simplified analytical model for feature-bagging in noisy ridge ensembles and proposes heterogeneous feature ensembling as an efficient way to reduce double-descent effects.

Findings

01

Subsampling shifts the double-descent peak in learning curves.

02

Heterogeneous feature ensembling mitigates double-descent effectively.

03

Performance insights extend to linear classifiers on image datasets.

Abstract

Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demonstrate that subsampling shifts the double-descent peak of a linear predictor. This leads us to introduce heterogeneous feature ensembling, with estimators built on varying numbers of feature dimensions, as a computationally efficient method to mitigate double-descent. Then, we compare the performance of a feature-subsampling ensemble to a single linear predictor, describing a trade-off between noise amplification due to subsampling and noise reduction due to ensembling. Our qualitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

benruben87/Learning-Curves-for-Heterogeneous-Feature-Subsampled-Ridge-Ensembles
pytorchOfficial

Videos

Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning