Analysis of ELSA COVID-19 Substudy response rate using machine learning algorithms
Marjan Qazvini

TL;DR
This study applies various machine learning algorithms to predict non-responses in the ELSA COVID-19 Substudy, demonstrating that random forests perform best overall, with different models excelling in specific metrics.
Contribution
It introduces the application of multiple ML algorithms to predict survey non-response in a longitudinal aging study, highlighting the most effective models.
Findings
Random forest achieved the highest balanced accuracy.
K-nearest neighbors had the best precision and test accuracy.
Logistic regression showed the highest AUC in ROC analysis.
Abstract
National Statistical Organisations every year spend time and money to collect information through surveys. Some of these surveys include follow-up studies, and usually, some participants due to factors such as death, immigration, change of employment, health, etc, do not participate in future surveys. In this study, we focus on the English Longitudinal Study of Ageing (ELSA) COVID-19 Substudy, which was carried out during the COVID-19 pandemic in two waves. In this substudy, some participants from wave 1 did not participate in wave 2. Our purpose is to predict non-responses using Machine Learning (ML) algorithms such as K-nearest neighbours (KNN), random forest (RF), AdaBoost, logistic regression, neural networks (NN), and support vector classifier (SVC). We find that RF outperforms other models in terms of balanced accuracy, KNN in terms of precision and test accuracy, and logistics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI
MethodsFocus
