Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning

Shimeng Huang; Matthew Robinson; Francesco Locatello

arXiv:2602.19782·cs.LG·February 24, 2026

Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning

Shimeng Huang, Matthew Robinson, Francesco Locatello

PDF

Open Access

TL;DR

This paper introduces a representation learning method that uses cross-environment invariance to identify true genetic instruments in Mendelian Randomization, addressing confounding issues caused by population stratification and assortative mating.

Contribution

It proposes a novel framework leveraging multi-environment data to recover latent exogenous genetic instruments, with theoretical guarantees and empirical validation.

Findings

01

Successfully identifies latent instruments in simulations

02

Demonstrates effectiveness on semi-synthetic data from All of Us

03

Provides theoretical guarantees for identification under various mechanisms

Abstract

Mendelian Randomization (MR) is a prominent observational epidemiological research method designed to address unobserved confounding when estimating causal effects. However, core assumptions -- particularly the independence between instruments and unobserved confounders -- are often violated due to population stratification or assortative mating. Leveraging the increasing availability of multi-environment data, we propose a representation learning framework that exploits cross-environment invariance to recover latent exogenous components of genetic instruments. We provide theoretical guarantees for identifying these latent instruments under various mixing mechanisms and demonstrate the effectiveness of our approach through simulations and semi-synthetic experiments using data from the All of Us Research Hub.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic Associations and Epidemiology · Genetic Mapping and Diversity in Plants and Animals · Advanced Causal Inference Techniques