Debiased machine learning for ultra-high dimensional mediation analysis

Kecheng Wei; Yahang Liu; Chen Huang; Ruilang Lin; Yongfu Yu; Guoyou Qin

PMC · DOI:10.1093/bioinformatics/btaf282·May 5, 2025

Debiased machine learning for ultra-high dimensional mediation analysis

Kecheng Wei, Yahang Liu, Chen Huang, Ruilang Lin, Yongfu Yu, Guoyou Qin

PDF

Open Access

TL;DR

This paper introduces a new machine learning framework to accurately identify mediators in complex data, reducing bias and improving inference in high-dimensional mediation analysis.

Contribution

A debiased machine learning framework for ultra-high dimensional mediation analysis with orthogonalized scores and cross-fitting is proposed.

Findings

01

The proposed method outperforms existing approaches in simulations with complex confounding.

02

DNA methylation at specific cytosine-phosphate-guanine sites mediates the effect of BMI on Alzheimer’s Disease.

03

Screening and regularization techniques effectively handle ultra-high dimensional mediators.

Abstract

In ultra-high dimensional mediation analysis, confounding variables can influence both mediators and outcomes through complex functional forms. While machine learning (ML) approaches are effective at modeling such complex relationships, they can introduce bias when estimating mediation effects. In this article, we propose a debiased ML framework that mitigates this bias, enabling accurate identification of key mediators and precise estimation and inference of their respective contributions. We construct an orthogonalized score function and use cross-fitting to reduce bias introduced by ML. To tackle ultra-high dimensional potential mediators, we implement screening and regularization techniques for variable selection and effect estimation. For statistical inference of the mediators’ contributions, we use an adjusted Sobel-type test. Simulation results demonstrate the superior…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases2

Alzheimer’s Disease AD

Figures2

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications