The Risk of Machine Learning
Alberto Abadie, Maximilian Kasy

TL;DR
This paper evaluates the performance of various machine learning estimators like ridge, lasso, and pretest in economic settings with many parameters, providing guidance on their selection and tuning for better estimation accuracy.
Contribution
It characterizes the risk of regularized estimators and shows that data-driven regularization choices nearly match the optimal risk, aiding applied researchers in estimator selection.
Findings
Regularized estimators' risk depends on data features.
Data-driven regularization choices perform nearly optimally.
Guidance for applied economists on estimator selection and tuning.
Abstract
Many applied settings in empirical economics involve simultaneous estimation of a large number of parameters. In particular, applied economists are often interested in estimating the effects of many-valued treatments (like teacher effects or location effects), treatment effects for many groups, and prediction models with many regressors. In these settings, machine learning methods that combine regularized estimation and data-driven choices of regularization parameters are useful to avoid over-fitting. In this article, we analyze the performance of a class of machine learning estimators that includes ridge, lasso and pretest in contexts that require simultaneous estimation of many parameters. Our analysis aims to provide guidance to applied researchers on (i) the choice between regularized estimators in practice and (ii) data-driven selection of regularization parameters. To address (i),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Causal Inference Techniques · Statistical Methods and Bayesian Inference
