
TL;DR
This paper emphasizes that data analysis is inherently subjective, as choices in measurement and unmeasured variables can distort interpretations, exemplified by phenomena like omitted variable bias and Simpson's paradox.
Contribution
It highlights the importance of recognizing implicit assumptions and unmeasured factors that can mislead data interpretation, urging more critical analysis.
Findings
Omitted variable bias can distort relationships between variables.
Unmeasured factors may reverse apparent correlations.
Superficial analysis can lead to incorrect conclusions.
Abstract
The belief that numbers offer a single, objective description of reality overlooks a crucial truth: data does not speak for itself. Every dataset results from choices-what to measure, how, when, and with whom-which inevitably reflect implicit, and sometimes ideological, assumptions about what is worth quantifying. Moreover, in any analysis, what remains unmeasured can be just as significant as what is captured. When a key variable is omitted-whether by neglect, design, or ignorance-it can distort the observed relationships between other variables. This phenomenon, known as omitted variable bias, may produce misleading correlations or conceal genuine effects. In some cases, accounting for this hidden factor can completely overturn the conclusions drawn from a superficial analysis. This is precisely the mechanism behind Simpson's paradox.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
