Ecological Regression with Partial Identification
Wenxin Jiang, Gary King, Allen Schmaltz, and Martin A. Tanner

TL;DR
This paper introduces a partially identified linear model for ecological inference, demonstrating its effectiveness on a large, diverse dataset collection by significantly narrowing bounds while reliably capturing true parameters.
Contribution
It presents a novel approach to limit non-identifiability in ecological inference and provides extensive empirical evaluation on the largest dataset collection to date.
Findings
Reduces the width of the deterministic bound by about 45% on average.
Captures the true district level parameter in over 97% of cases.
Provides publicly available datasets for future research.
Abstract
Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We study a partially identified linear contextual effects model for EI and describe how to estimate the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. This may be regarded as a first attempt in this venerable literature to limit the scope of the key form of non-identifiability in EI. To study the operating characteristics of our model, we have amassed the largest collection of data with known ground truth ever applied to evaluate solutions to the EI problem. We collect and study 459 datasets from a variety of fields including public health, political science, and sociology. The datasets contain a total of 2,370,854 geographic units (e.g., precincts), with an average of 5,165 geographic units per dataset. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
