On the Efficiency of ERM in Feature Learning
Ayoub El Hanchi, Chris J. Maddison, Murat A. Erdogdu

TL;DR
This paper analyzes the efficiency of empirical risk minimization in feature learning, showing conditions under which ERM performs close to an oracle with prior knowledge, and providing bounds on excess risk.
Contribution
It offers a theoretical analysis of ERM's performance in feature learning, including asymptotic and non-asymptotic bounds, and applies results to sparse linear regression.
Findings
ERM quantiles match oracle risk when the feature set is not too large.
Global complexity of feature set affects ERM excess risk.
New guarantees for subset selection in sparse linear regression.
Abstract
Given a collection of feature maps indexed by a set , we study the performance of empirical risk minimization (ERM) on regression problems with square loss over the union of the linear classes induced by these feature maps. This setup aims at capturing the simplest instance of feature learning, where the model is expected to jointly learn from the data an appropriate feature map and a linear predictor. We start by studying the asymptotic quantiles of the excess risk of sequences of empirical risk minimizers. Remarkably, we show that when the set is not too large and when there is a unique optimal feature map, these quantiles coincide, up to a factor of two, with those of the excess risk of the oracle procedure, which knows a priori this optimal feature map and deterministically outputs an empirical risk minimizer from the associated optimal linear class. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsLinear Regression · Sparse Evolutionary Training
