Data science is science's second chance to get causal inference right: A classification of data science tasks
Miguel A. Hern\'an, John Hsu, Brian Healy

TL;DR
This paper proposes a classification of data science tasks emphasizing the importance of causal inference from observational data, highlighting the role of domain knowledge and clarifying distinctions among description, prediction, and counterfactual prediction.
Contribution
It introduces an explicit classification framework for data science tasks, emphasizing the integration of causal inference and domain expertise in observational data analysis.
Findings
Causal inference requires domain knowledge and appropriate data and algorithms.
Classifying data science tasks clarifies their roles and requirements.
Understanding task distinctions improves data analysis and decision-making.
Abstract
Causal inference from observational data is the goal of many data analyses in the health and social sciences. However, academic statistics has often frowned upon data analyses with a causal objective. The introduction of the term "data science" provides a historic opportunity to redefine data analysis in such a way that it naturally accommodates causal inference from observational data. Like others before, we organize the scientific contributions of data science into three classes of tasks: Description, prediction, and counterfactual prediction (which includes causal inference). An explicit classification of data science tasks is necessary to discuss the data, assumptions, and analytics required to successfully accomplish each task. We argue that a failure to adequately describe the role of subject-matter expert knowledge in data analysis is a source of widespread misunderstandings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCausal inference
