Double Machine Learning and Automated Confounder Selection -- A   Cautionary Tale

Paul H\"unermund (Copenhagen Business School); Beyers Louw (Maastricht; University); Itamar Caspi (Bank of Israel)

arXiv:2108.11294·econ.EM·May 25, 2023

Double Machine Learning and Automated Confounder Selection -- A Cautionary Tale

Paul H\"unermund (Copenhagen Business School), Beyers Louw (Maastricht, University), Itamar Caspi (Bank of Israel)

PDF

Open Access

TL;DR

This paper warns that double machine learning can be highly sensitive to the inclusion of certain covariates, risking bias and invalid causal inference in high-dimensional variable selection.

Contribution

It highlights the risks and limitations of automated confounder selection using DML, especially regarding endogenous variables and bad controls.

Findings

01

DML is sensitive to a few bad controls.

02

Including endogenous variables biases estimates.

03

Data-driven control selection can be unreliable.

Abstract

Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This paper demonstrates that DML is very sensitive to the inclusion of only a few "bad controls" in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Forecasting Techniques and Applications · Advanced Statistical Process Monitoring