Coarsened data in small area estimation: a Bayesian two-part model for mapping smoking behaviour
Aldo Gardini, Lorenzo Mori

TL;DR
This paper develops a Bayesian two-part small area estimation model to accurately estimate smoking behavior across Italian regions and age groups, accounting for data coarsening effects like rounding and topcoding.
Contribution
It introduces a novel Bayesian framework that explicitly models coarsening mechanisms in semi-continuous responses for improved small area estimates.
Findings
Ignoring coarsening leads to biased estimates.
The proposed model achieves better accuracy and coverage.
Application reveals detailed smoking patterns across regions and ages.
Abstract
Estimating health indicators for restricted sub-populations is a recurring challenge in epidemiology and public health. When survey data are used, Small Area Estimation (SAE) methods can improve precision by borrowing strength across domains. In many applications, however, outcomes are self-reported and affected by coarsening mechanisms, such as rounding and digit preference, that reduce data resolution and may bias inference. This paper addresses both issues by developing a Bayesian unit-level SAE framework for semi-continuous, coarsened responses. Motivated by the 2019 Italian European Health Interview Survey, we estimate smoking indicators for domains defined by the cross-classification of Italian regions and age groups, capturing both smoking prevalence and intensity. The model adopts a two-part structure: a logistic component for smoking prevalence and a flexible mixture of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models · Data-Driven Disease Surveillance
