Correcting for non-ignorable missingness in smoking trends
Juho Kopra, Tommi H\"ark\"anen, Hanna Tolonen, Juha Karvanen

TL;DR
This paper introduces a Bayesian method utilizing registry data to correct for non-ignorable missing data in health surveys, significantly improving the accuracy of smoking prevalence estimates.
Contribution
It presents a novel approach combining registry data with Bayesian modeling to address non-ignorable missingness in survey data, demonstrated on Finnish smoking prevalence data.
Findings
Estimated smoking prevalence rates are significantly affected by missing data.
The approach effectively estimates parameters of the missingness mechanism.
Registry data enhances the correction of non-ignorable missingness in surveys.
Abstract
Data missing not at random (MNAR) is a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972-1997. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The underlying data generation process is modelled by a Bayesian model. The results indicate that the estimated smoking prevalence rates in Finland may be significantly affected by missing data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
