
TL;DR
This paper explains Simpson's paradox using simple algebra and geometry, clarifying when and how the reversal of associations occurs, and discusses its implications in predictive and causal contexts.
Contribution
It provides an accessible geometric and algebraic explanation of Simpson's paradox, explicitly states the conditions for its occurrence, and discusses its relevance in practical and causal reasoning contexts.
Findings
The paradox can occur under non-extreme dependence between variables.
It is always possible to define a third variable algebraically to produce the paradox.
The occurrence of the paradox depends on the context and interpretation by domain experts.
Abstract
Well known Simpson's paradox is puzzling and surprising for many, especially for the empirical researchers and users of statistics. However there is no surprise as far as mathematical details are concerned. A lot more is written about the paradox but most of them are beyond the grasp of such users. This short article is about explaining the phenomenon in an easy way to grasp using simple algebra and geometry. The mathematical conditions under which the paradox can occur are made explicit and a simple geometrical illustrations is used to describe it. We consider the reversal of the association between two binary variables, say, and by a third binary variable, say, . We show that it is always possible to define algebraically for non-extreme dependence between and , therefore occurrence of the paradox depends on identifying it with a practical meaning for it in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Advanced Causal Inference Techniques · Statistical Methods in Clinical Trials
