# Debiased machine learning for ultra-high dimensional mediation analysis

**Authors:** Kecheng Wei, Yahang Liu, Chen Huang, Ruilang Lin, Yongfu Yu, Guoyou Qin

PMC · DOI: 10.1093/bioinformatics/btaf282 · 2025-05-05

## TL;DR

This paper introduces a new machine learning framework to accurately identify mediators in complex data, reducing bias and improving inference in high-dimensional mediation analysis.

## Contribution

A debiased machine learning framework for ultra-high dimensional mediation analysis with orthogonalized scores and cross-fitting is proposed.

## Key findings

- The proposed method outperforms existing approaches in simulations with complex confounding.
- DNA methylation at specific cytosine-phosphate-guanine sites mediates the effect of BMI on Alzheimer’s Disease.
- Screening and regularization techniques effectively handle ultra-high dimensional mediators.

## Abstract

In ultra-high dimensional mediation analysis, confounding variables can influence both mediators and outcomes through complex functional forms. While machine learning (ML) approaches are effective at modeling such complex relationships, they can introduce bias when estimating mediation effects. In this article, we propose a debiased ML framework that mitigates this bias, enabling accurate identification of key mediators and precise estimation and inference of their respective contributions.

We construct an orthogonalized score function and use cross-fitting to reduce bias introduced by ML. To tackle ultra-high dimensional potential mediators, we implement screening and regularization techniques for variable selection and effect estimation. For statistical inference of the mediators’ contributions, we use an adjusted Sobel-type test. Simulation results demonstrate the superior performance of the proposed method in handling complex confounding. Applying this method to Alzheimer’s Disease Neuroimaging Initiative data, we identify several cytosine-phosphate-guanine sites where DNA methylation mediates the effect of body mass index on Alzheimer’s Disease.

The R function DML_HDMA implementing the proposed methods is available online at https://github.com/Wei-Kecheng/DML_HDMA.

## Linked entities

- **Diseases:** Alzheimer’s Disease (MONDO:0004975)

## Full-text entities

- **Diseases:** AD (MESH:D000544)

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12198499/full.md

---
Source: https://tomesphere.com/paper/PMC12198499