# Variant selection to maximize variance explained in cis-Mendelian randomization

**Authors:** Ang Zhou, Ville Karhunen, Haodong Tian, Janne Pott, Ashish Patel, Eric A.W. Slob, Stephen Burgess

PMC · DOI: 10.1016/j.xhgg.2026.100573 · Human Genetics and Genomics Advances · 2026-01-16

## TL;DR

This paper shows that using multiple correlated genetic variants in a gene region improves the accuracy of Mendelian randomization studies compared to using just the lead variant.

## Contribution

The study introduces and evaluates methods to incorporate non-lead variants in cis-Mendelian randomization to improve instrument strength.

## Key findings

- Using non-lead variants increased the variance explained in the exposure by up to 169.4% in the haptoglobin gene region.
- All methods recovered expected genetic variance in simulations.
- Across 15 gene regions, methods incorporating non-lead variants consistently outperformed the lead-variant-only approach.

## Abstract

Optimal selection of instrumental variables (IVs) from a single gene region in cis-Mendelian randomization (MR) is challenging, as variants are highly correlated due to linkage disequilibrium (LD). Using only the lead variant is convenient but may not achieve full statistical power if multiple signals exist. We compared four selection methods that incorporate correlated non-lead variants, including LD-pruning, conditional and joint analysis (COJO), sum of single effects (SuSiE) regression, and principal component analysis (PCA), and evaluated their ability to increase instrument strength, measured by variance explained in the exposure (R2), relative to the lead-variant-only approach. We applied these methods to circulating haptoglobin (HP), to simulated traits with known variance explained, and to 15 additional gene regions where non-lead cis-protein quantitative trait loci (pQTLs) contributed varying proportions of cis-genetic variance. R2 was estimated from variant-protein association estimates (Fenland study, n = 10,708) using LD from the UK Biobank (n = 356,557). In the HP region, the four methods produced a median proportional gain in R2 of 145.1% compared with the lead variant alone (range: 69.6%–169.4%), with a median reduction in the MR standard error of 36.3% (range: −37.9% to −19.3%). In simulations, all methods were able to recover the expected genetic variance. Across the 15 gene regions, methods incorporating non-lead variants consistently outperformed the lead-variant-only approach. Variant selection methods incorporating correlated non-lead variants can reliably improve instrument strength in cis-MR analyses. We recommend using such methods but advise comparing their estimates with the lead-variant-only estimate to safeguard against numerical instability.

Variant selection strategies for cis-Mendelian randomization that incorporate multiple correlated variants in a gene region can explain more exposure variance than a lead-variant-only strategy. Estimates using correlated variants can be more precise but should be compared to the lead-variant-only estimate to check for overprecision due to numerical instability.

## Linked entities

- **Genes:** HP (haptoglobin) [NCBI Gene 3240]

## Full-text entities

- **Genes:** HP (haptoglobin) [NCBI Gene 3240] {aka HP2ALPHA2, HPA1S}

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12887808/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12887808/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12887808/full.md

---
Source: https://tomesphere.com/paper/PMC12887808