From controlled to undisciplined data: estimating causal effects in the era of data science using a potential outcome framework
Francesca Dominici, Falco J. Bargagli-Stoffi, Fabrizia Mealli

TL;DR
This paper reviews causal inference principles, emphasizing the importance of study design and assumptions, and discusses integrating big data and machine learning for estimating causal effects.
Contribution
It clarifies core causal inference principles and advocates for combining experimental thinking with big data and ML, emphasizing study design over data quantity.
Findings
Experimental thinking is crucial in causal inference.
Data quality and study design are more important than data quantity.
Big data and ML should complement, not replace, thoughtful study design.
Abstract
This paper discusses the fundamental principles of causal inference - the area of statistics that estimates the effect of specific occurrences, treatments, interventions, and exposures on a given outcome from experimental and observational data. We explain the key assumptions required to identify causal effects, and highlight the challenges associated with the use of observational data. We emphasize that experimental thinking is crucial in causal inference. The quality of the data (not necessarily the quantity), the study design, the degree to which the assumptions are met, and the rigor of the statistical analysis allow us to credibly infer causal effects. Although we advocate leveraging the use of big data and the application of machine learning (ML) algorithms for estimating causal effects, they are not a substitute of thoughtful study design. Concepts are illustrated via examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
