Machine Learning in Epidemiology

Marvin N. Wright; Lukas Burk; Pegah Golchian; Jan Kapar; Niklas Koenen; Sophie Hanna Langbein

arXiv:2602.16352·stat.ML·February 19, 2026

Machine Learning in Epidemiology

Marvin N. Wright, Lukas Burk, Pegah Golchian, Jan Kapar, Niklas Koenen, Sophie Hanna Langbein

PDF

Open Access

TL;DR

This paper reviews how machine learning techniques can be effectively applied in epidemiology, emphasizing methods, evaluation strategies, and interpretability, supported by practical R code examples using heart disease data.

Contribution

It provides a comprehensive methodological foundation for applying machine learning in epidemiology, including principles, methods, evaluation, and interpretability, with practical R examples.

Findings

01

Introduces core machine learning principles for epidemiology

02

Details strategies for model evaluation and hyperparameter tuning

03

Provides practical R code examples with heart disease data

Abstract

In the age of digital epidemiology, epidemiologists are faced by an increasing amount of data of growing complexity and dimensionality. Machine learning is a set of powerful tools that can help to analyze such enormous amounts of data. This chapter lays the methodological foundations for successfully applying machine learning in epidemiology. It covers the principles of supervised and unsupervised learning and discusses the most important machine learning methods. Strategies for model evaluation and hyperparameter optimization are developed and interpretable machine learning is introduced. All these theoretical parts are accompanied by code examples in R, where an example dataset on heart disease is used throughout the chapter.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · Statistical Methods in Epidemiology