# Leveraging large-scale biobanks for therapeutic target discovery

**Authors:** Brian R. Ferolito, Hesam Dashti, Claudia Giambartolomei, Gina M. Peloso, Daniel J. Golden, Kai Gravel-Pucillo, Danielle Rasooly, Andrea R.V.R. Horimoto, Rachael Matty, Liam Gaziano, Yi Liu, Ines A. Smit, Barbara Zdrazil, Yakov Tsepilov, Lauren Costa, Nicole Kosik, Jennifer E. Huffman, Gian Gaetano Tartaglia, Giorgio Bini, Gabriele Proietti, Harris Ioannidis, Mohd A. Karim, Fiona Hunter, Gibran Hemani, Adam S. Butterworth, Emanuele Di Angelantonio, Claudia Langenberg, Maya Ghoussaini, Andrew R. Leach, Katherine P. Liao, Scott Damrauer, Luis E. Selva, Stacey Whitbourne, Philip S. Tsao, Jennifer Moser, Tom Gaunt, Tianxi Cai, John C. Whittaker, Juan P. Casas, Sumitra Muralidhar, J. Michael Gaziano, Kelly Cho, Alexandre C. Pereira

PMC · DOI: 10.1016/j.xhgg.2025.100556 · Human Genetics and Genomics Advances · 2025-12-09

## TL;DR

This study uses genetic data from large biobanks to identify gene-trait relationships that could lead to new drug targets and improves predictions for their therapeutic potential.

## Contribution

The study harmonizes biobank data and applies MR and machine learning to identify and rank gene-trait pairs for drug development.

## Key findings

- 69,669 gene-trait pairs with causal evidence were identified across 2,003 traits.
- 9% of approved drug targets were rediscovered using the MR approach.
- A machine learning model accurately predicted the likelihood of MR results becoming approved drugs (AUC 0.79).

## Abstract

Large biobanks, including the Million Veteran Program (MVP), the UK Biobank, and FinnGen, provide genetic association results for more than 1 million individuals for hundreds of phenotypes. To select targets for pharmaceutical development, as well as to improve the understanding of existing targets, we harmonized these studies and performed two-sample Mendelian randomization (MR) on 2,003 phenotypes using genetic variants associated with gene expression (derived from GTEx and eQTLGen) and plasma protein levels (derived from ARIC, Fenland, and deCODE) as proxies of target modulation. We found 69,669 gene-trait pairs with evidence (p ≤ 1.6 × 10−9) for causal effects. From the selected gene-trait pairs, we observed 6,447 genes with strong causal evidence for at least one of 2,003 investigated traits. As expected, being identified as a gene-trait pair in our approach was significantly associated with higher odds of being an approved drug target and indication. We were able to rediscover 9% of approved drug targets in ChEMBL 34. Moreover, identified gene-traits were significantly associated with higher odds of being previously described as a gene-trait pair in OMIM, ClinVar, mouse knockout data, and rare variant burden studies. To enhance the translational potential of the resource, we developed a predictive ranking model trained using approved drug targets described in ChEMBL 34 as well as several different biological annotations. This model was able to accurately predict the odds of a particular significant MR result being developed into an approved drug and its clinical indication (precision-recall area under the receiver operating characteristic curve 0.79). We make our results publicly available in CIPHER.

Using two-sample Mendelian randomization with eQTL and pQTL instruments, the study harmonizes genome-wide association study data from >1 million participants across 2,003 traits and uncovers 69,669 significant gene-trait links involving 6,447 genes, capturing 9% of approved drug targets. A follow-on machine learning model ranks each link’s therapeutic promise with a precision-recall area under the receiver operating characteristic curve of 0.79.

## Full-text entities

- **Species:** Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12799792/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12799792/full.md

## References

76 references — full list in the complete paper: https://tomesphere.com/paper/PMC12799792/full.md

---
Source: https://tomesphere.com/paper/PMC12799792