# An integrative framework for drug target discovery bridging clinical trial and genetic data insights

**Authors:** Johanna Mielke, Tobias Strunz, Jeongah Lee, Patrick Schloemer, Giuseppe Gallone

PMC · DOI: 10.1186/s12967-026-07806-x · Journal of Translational Medicine · 2026-03-13

## TL;DR

This paper introduces a new framework called 'back-translation' that combines clinical trial data with genetic data to discover new drug targets for diseases like chronic kidney disease.

## Contribution

The novel 'back-translation' framework integrates clinical and genetic data to identify therapeutic targets using risk scores derived from clinical insights.

## Key findings

- The risk score accurately identified high-risk CKD patients in both FIDELITY and GCKD datasets.
- Genetic analysis in UKBB revealed multiple candidate genes for novel therapeutic targets in kidney disease.
- The framework is generalizable and therapeutic area-agnostic, offering a new approach to drug target discovery.

## Abstract

Over the past few years, the considerable growth in the availability of population-scale genomic data has provided a significant boost in supporting quantitative, well powered, data-driven approaches to drug target discovery. However, population-scale genomic biobanks often lack comprehensive longitudinal phenotyping and in-depth clinical annotation. In contrast, clinical trial data, rich in phenotypic detail, frequently lack accompanying omics information, hindering mechanistic understanding of clinical findings. To address these shortcomings, we propose a framework called “back-translation” that leverages the strengths of both datasets by translating patient insights from clinical data to biobank context, to enable the discovery of novel insights based on the unique strengths of both data types.

Our framework consists of two main steps. First, we identify a subgroup of interest within the clinical data and construct a classifier (risk score) to accurately identify patients in this subgroup. In the second step, we validate the derived risk score and then transfer it to the biobank data. The risk score serves as a proxy for characterizing the subgroup, which enables us to perform rare and common genetic variant association tests.

We demonstrate the value of this approach in a pilot study using clinical trial data from the FIDELITY dataset combined with biobank data from the UK Biobank (UKBB) and the German Chronic Kidney Disease (GCKD) cohort, focusing on fast kidney disease progression in patients with Chronic Kidney Disease (CKD). Our results show that the derived risk score accurately identifies high-risk patients in both FIDELITY and GCKD. Our genetic analysis of the clinical risk score in the UKBB identifies multiple genes that may serve as candidates for novel therapeutic target investigation.

We propose a generalizable framework for the identification of data-driven targets that is therapeutic area-agnostic. This approach offers a novel and innovative opportunity to integrate clinical data into target identification via “back-translation,” utilizing clinical insights previously underutilized in a research context. By bridging clinical and genetic data, our framework enhances the potential for discovering novel therapeutic targets and for advancing precision medicine.

NCT02540993, NCT02545049

The online version contains supplementary material available at 10.1186/s12967-026-07806-x.

## Linked entities

- **Diseases:** Chronic Kidney Disease (MONDO:0005300)

## Full-text entities

- **Genes:** FNIP1 (folliculin interacting protein 1) [NCBI Gene 96459] {aka IMD93}, DNM3 (dynamin 3) [NCBI Gene 26052] {aka Dyna III}, PRKCE (protein kinase C epsilon) [NCBI Gene 5581] {aka PKCE, nPKC-epsilon}, COL4A3 (collagen type IV alpha 3 chain) [NCBI Gene 1285] {aka ATS2, ATS3, ATS3A, ATS3B, BFH2}, TNK2 (tyrosine kinase non receptor 2) [NCBI Gene 10188] {aka ACK, ACK-1, ACK1, p21cdc42Hs}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107] {aka D6S204, HLA-JY3, HLAC, HLC-C, MHC, PSORS1}, HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105] {aka HLAA}, TGFB1 (transforming growth factor beta 1) [NCBI Gene 7040] {aka CAEND1, CED, DPD1, IBDIMDE, LAP, TGF-beta1}
- **Diseases:** lung cysts (MESH:D003560), polycystic kidney disease (MESH:D007690), Alport disease nephropathy (MESH:D007674), diabetic CKD (MESH:D003928), death (MESH:D003643), Alport syndrome (MESH:D009394), T2D (MESH:D003924), renal and cardiovascular diseases (MESH:D002318), renal tumors (MESH:D007680), hypertension (MESH:D006973), Birt-Hogg-Dube syndrome (MESH:D058249), diabetes (MESH:D003920), loss of hearing and eye abnormalities (MESH:D034381), CKD (MESH:D051436), focal and segmental glomerulonephritis (MESH:D005923), autosomal dominant inherited disease (MESH:D030342), kidney failure (MESH:D051437), end-stage kidney disease (MESH:D007676)
- **Chemicals:** LDL Ch (-), cholesterol (MESH:D002784), creatinine (MESH:D003404)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** rs1260326

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12994227/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12994227/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12994227/full.md

---
Source: https://tomesphere.com/paper/PMC12994227