Evaluating the performance of personal, social, health-related, biomarker and genetic data for predicting an individuals future health using machine learning: A longitudinal analysis
Mark Green

TL;DR
This study evaluates the predictive power of various personal, social, health, biomarker, and genetic data for future health using machine learning, finding health-related data most predictive and complex models offering limited improvements.
Contribution
It compares the effectiveness of different data types and machine learning methods in predicting future health, highlighting the limited benefit of added complexity.
Findings
Health-related measures are the strongest predictors of future health.
Genetic data performs poorly in predicting health outcomes.
Machine learning models show marginal improvements over traditional logistic regression.
Abstract
As we gain access to a greater depth and range of health-related information about individuals, three questions arise: (1) Can we build better models to predict individual-level risk of ill health? (2) How much data do we need to effectively predict ill health? (3) Are new methods required to process the added complexity that new forms of data bring? The aim of the study is to apply a machine learning approach to identify the relative contribution of personal, social, health-related, biomarker and genetic data as predictors of future health in individuals. Using longitudinal data from 6830 individuals in the UK from Understanding Society (2010-12 to 2015-17), the study compares the predictive performance of five types of measures: personal (e.g. age, sex), social (e.g. occupation, education), health-related (e.g. body weight, grip strength), biomarker (e.g. cholesterol, hormones) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth, Environment, Cognitive Aging · Artificial Intelligence in Healthcare · Healthcare Systems and Public Health
MethodsLogistic Regression
