A Framework for Mediation Analysis with Massive Data
Haixiang Zhang, Xin Li

TL;DR
This paper introduces scalable algorithms for mediation analysis tailored for massive datasets, significantly improving computational efficiency while maintaining statistical accuracy, demonstrated through simulations and real data applications.
Contribution
It proposes subsampled double bootstrap and divide-and-conquer algorithms for efficient mediation analysis in big data contexts, addressing computational challenges.
Findings
Algorithms outperform traditional methods in speed and efficiency.
Maintains confidence interval coverage and statistical power.
Validated through simulations and real-world data examples.
Abstract
During the past few years, mediation analysis has gained increasing popularity across various research fields. The primary objective of mediation analysis is to examine the direct impact of exposure on outcome, as well as the indirect effects that occur along the pathways from exposure to outcome. There has been a great number of articles that applied mediation analysis to data from hundreds or thousands of individuals. With the rapid development of technology, the volume of avaliable data increases exponentially, which brings new challenges to researchers. Directly conducting statistical analysis for large datasets is often computationally infeasible. Nonetheless, there is a paucity of findings regarding mediation analysis in the context of big data. In this paper, we propose utilizing subsampled double bootstrap and divide-and-conquer algorithms to conduct statistical mediation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Mapping · Bayesian Modeling and Causal Inference · Mental Health Research Topics
