Socio-Conformal Calibration in Complex Survey Data: Marginal Validity Is Not Enough for Subgroup Reliability
Amir Rafe, Subasish Das

TL;DR
This study evaluates the effectiveness of conformal prediction methods in survey-based social measurement, revealing that achieving marginal validity does not ensure subgroup reliability, and naive group-specific calibration may be unreliable.
Contribution
It demonstrates that standard and group-specific conformal methods often fail to provide reliable subgroup uncertainty estimates in complex survey data.
Findings
Standard conformal achieves nominal marginal coverage but leaves subgroup gaps.
Group-specific conformal can worsen fairness-efficiency trade-offs.
Regularized calibration reduces subgroup gaps but does not improve fairness.
Abstract
Machine-learning systems used in survey-based social measurement require uncertainty estimates that are reliable across population subgroups, not merely valid in aggregate. We study ordinal conformal prediction for five-level AI-attitude forecasting on the Pew American Trends Panel (Wave 152; n=4,591; 12 race x education subgroups), comparing standard split conformal, Mondrian (group-specific) conformal, and a regularized Mondrian comparator across 100 respondent-disjoint splits with survey-weighted evaluation. Standard conformal achieves nominal marginal coverage for all four base predictors but leaves weighted subgroup gaps of ~13 percentage points. For the strongest predictor (XGBoost), Mondrian worsens the fairness-efficiency trade-off: weighted set size rises by +0.036 (dz =1.66) while the weighted subgroup gap grows by +0.013 (dz =0.30). A regularized comparator that shrinks group…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
