# A Modification to Two‐Stage Least Squares With Genetic Applications

**Authors:** Lei Fang, Wei Pan

PMC · DOI: 10.1002/sim.70308 · Statistics in Medicine · 2025-11-07

## TL;DR

This paper introduces a modified version of the two-stage least squares method, called r2SLS, to improve causal inference in genetic studies by reducing bias and increasing statistical power.

## Contribution

The novel contribution is the r2SLS method, which reverses the prediction step to enhance efficiency and robustness in genetic association studies.

## Key findings

- r2SLS is asymptotically unbiased and normally distributed.
- r2SLS can be more efficient than 2SLS under certain conditions.
- r2SLS shows better type I error control and robustness to weak IVs in simulations and real data.

## Abstract

Two‐stage least squares (2SLS) is by default applied to infer a putative causal association between an exposure, such as a gene or a protein, with an outcome such as a complex disease or trait, in transcriptome‐ or proteome‐wide association studies (TWAS/PWAS). In a typical two‐sample setting for TWAS/PWAS, the stage 1 sample size is much smaller than that of stage 2. To reduce the resulting attenuation bias and estimation uncertainty in stage 1 and boost the statistical power of the conventional TWAS, we propose a new method, called reverse two‐stage least squares (r2SLS): Instead of imputing a gene's expression (using genetic variants as instrumental variables, IVs) in stage 1 and then testing the association between the imputed expression and the observed outcome in stage 2 in the conventional 2SLS approach, we propose predicting the outcome (using IVs) and testing the association between the predicted outcome and the observed gene expression. Theoretically, we establish that the r2SLS estimator is asymptotically unbiased with a normal distribution. We also show theoretically when 2SLS and r2SLS are asymptotically equivalent and when r2SLS is asymptotically more efficient than 2SLS. We also consider the practical issue of how to select invalid IVs. We use simulations and three real data examples based on the GTEx gene expression data, UKB‐PPP proteomic data, and several GWAS summary datasets to demonstrate some advantages of r2SLS over 2SLS, including possibly better type I error control, higher statistical power and robustness to weak IVs.

## Full-text entities

- **Genes:** IL34 (interleukin 34) [NCBI Gene 146433] {aka C16orf77, IL-34}, TREM2 (triggering receptor expressed on myeloid cells 2) [NCBI Gene 54209] {aka AD17, PLOSL2, TREM-2, Trem2a, Trem2b, Trem2c}, CTSB (cathepsin B) [NCBI Gene 1508] {aka APPS, CPSB, KWE, RECEUP}, SORT1 (sortilin 1) [NCBI Gene 6272] {aka Gp95, LDLCQ6, NT3, NTR3}, APOE (apolipoprotein E) [NCBI Gene 348] {aka AD2, APO-E, ApoE4, LDLCQ5, LPG}, MME (membrane metalloendopeptidase) [NCBI Gene 4311] {aka CALLA, CD10, CMT2T, NEP, SCA43, SFE}, CTSH (cathepsin H) [NCBI Gene 1512] {aka ACC-4, ACC-5, ACC4, ACC5, CPSB}, GRN (granulin precursor) [NCBI Gene 2896] {aka CLN11, FTD2, GEP, GP88, PCDGF, PEPI}, ICA1 (islet cell autoantigen 1) [NCBI Gene 3382] {aka ICA69, ICAp69}, EPHA1 (EPH receptor A1) [NCBI Gene 2041] {aka EPH, EPHT, EPHT1}
- **Diseases:** AD (MESH:D000544), CAD (MESH:D003327), asthma (MESH:D001249)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12593333/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12593333/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12593333/full.md

---
Source: https://tomesphere.com/paper/PMC12593333