On the Efficiency of ERM in Feature Learning

Ayoub El Hanchi; Chris J. Maddison; Murat A. Erdogdu

arXiv:2411.12029·stat.ML·November 20, 2024

On the Efficiency of ERM in Feature Learning

Ayoub El Hanchi, Chris J. Maddison, Murat A. Erdogdu

PDF

Open Access

TL;DR

This paper analyzes the efficiency of empirical risk minimization in feature learning, showing conditions under which ERM performs close to an oracle with prior knowledge, and providing bounds on excess risk.

Contribution

It offers a theoretical analysis of ERM's performance in feature learning, including asymptotic and non-asymptotic bounds, and applies results to sparse linear regression.

Findings

01

ERM quantiles match oracle risk when the feature set is not too large.

02

Global complexity of feature set affects ERM excess risk.

03

New guarantees for subset selection in sparse linear regression.

Abstract

Given a collection of feature maps indexed by a set $T$ , we study the performance of empirical risk minimization (ERM) on regression problems with square loss over the union of the linear classes induced by these feature maps. This setup aims at capturing the simplest instance of feature learning, where the model is expected to jointly learn from the data an appropriate feature map and a linear predictor. We start by studying the asymptotic quantiles of the excess risk of sequences of empirical risk minimizers. Remarkably, we show that when the set $T$ is not too large and when there is a unique optimal feature map, these quantiles coincide, up to a factor of two, with those of the excess risk of the oracle procedure, which knows a priori this optimal feature map and deterministically outputs an empirical risk minimizer from the associated optimal linear class. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsLinear Regression · Sparse Evolutionary Training