Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics
Chao Zhou

TL;DR
This paper reveals how aggregation causes Simpson's paradox in behavioral curves, distorting user engagement models across different datasets due to survival bias, and proposes a calibration method to mitigate false positives.
Contribution
It identifies the systematic distortion caused by aggregation in behavioral modeling and introduces Synthetic Null Calibration to address false positive rates.
Findings
Individual user peaks at ~11 exposures, aggregate peaks at ~34 exposures
Distortion varies across datasets, with up to 5.3x difference
Synthetic Null Calibration reduces false positives in classification
Abstract
Behavioral curve modeling -- fitting parametric functions to engagement-versus-exposure data -- is standard practice in recommendation, advertising, and clinical dosing. We show that aggregation introduces a systematic distortion: Simpson's paradox in behavioral curves. On Goodreads (3.3M users, 9 genres), individual users peak at n* approximately 11 exposures while the aggregate peaks at n* approximately 34 -- a 3x gap driven by survival bias. Amazon Electronics (18M reviews) shows a 5.3x distortion. MovieLens-25M (D approximately 1) serves as a negative control, confirming that survival bias -- not aggregation per se -- is the operative mechanism. The distortion is robust to category granularity, engagement operationalization, and classifier calibration. We develop Synthetic Null Calibration to address a 32% false positive rate in per-user classification. Our findings apply wherever…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
