Bayesian models for data missing not at random in health examination surveys
Juho Kopra, Juha Karvanen, Tommi H\"ark\"anen

TL;DR
This paper introduces a Bayesian data augmentation and survival modeling approach to address nonresponse bias in health surveys with data missing not at random, improving risk factor estimates.
Contribution
It presents a novel Bayesian method incorporating follow-up data to effectively reduce bias from MNAR missing data in epidemiological surveys.
Findings
The proposed approach substantially reduces nonresponse bias.
Compared to MAR imputation, it performs better in bias reduction.
Simulation confirms the method's validity.
Abstract
In epidemiological surveys, data missing not at random (MNAR) due to survey nonresponse may potentially lead to a bias in the risk factor estimates. We propose an approach based on Bayesian data augmentation and survival modelling to reduce the nonresponse bias. The approach requires additional information based on follow-up data. We present a case study of smoking prevalence using FINRISK data collected between 1972 and 2007 with a follow-up to the end of 2012 and compare it to other commonly applied missing at random (MAR) imputation approaches. A simulation experiment is carried out to study the validity of the approaches. Our approach appears to reduce the nonresponse bias substantially, where as MAR imputation was not successful in bias reduction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
