Spatio-Causal Patterns of Sample Growth
Andre F. Ribeiro

TL;DR
This paper explores how different types of sample growth affect the ability of systems to make fair, interpretable, and generalizable predictions across space and time, using historical census data.
Contribution
It introduces a theoretical framework distinguishing unconfounded and externally-valid sample growth, linking them to prediction fairness and generalizability, with empirical illustrations from census data.
Findings
Unconfounded samples enable fair, interpretable predictions.
Externally-valid samples support generalization across out-of-sample variation.
Connections among Shapley value, counterfactuals, and hyperbolic geometry are established.
Abstract
Different statistical samples (e.g., from different locations) offer populations and learning systems observations with distinct statistical properties. Samples under (1) 'Unconfounded' growth preserve systems' ability to determine the independent effects of their individual variables on any outcome-of-interest (and lead, therefore, to fair and interpretable black-box predictions). Samples under (2) 'Externally-Valid' growth preserve their ability to make predictions that generalize across out-of-sample variation. The first promotes predictions that generalize over populations, the second over their shared uncontrolled factors. We illustrate these theoretic patterns in the full American census from 1840 to 1940, and samples ranging from the street-level all the way to the national. This reveals sample requirements for generalizability over space and time, and new connections among the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater resources management and optimization · Statistics Education and Methodologies · Forecasting Techniques and Applications
