# Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants

**Authors:** Bill Qi, Yannis J. Trakadis

PMC · DOI: 10.3390/bioengineering12060595 · Bioengineering · 2025-05-31

## TL;DR

This study uses graph-based machine learning to predict medication usage based on genetic data from the UK Biobank, showing better performance than traditional models.

## Contribution

The novel use of graph representation learning with a knowledge graph improves medication usage prediction using pharmacogenetic data.

## Key findings

- The GCN model outperformed logistic regression and deep neural network models in predicting medication usage.
- Medications with higher sample sizes showed better prediction performance.
- A graph-based approach could help prioritize personalized medications based on genetic data.

## Abstract

Ineffective treatment and side effects are associated with high burdens for the patient and society. We investigated the application of graph representation learning (GRL) for predicting medication usage based on individual genetic data in the United Kingdom Biobank (UKBB). A graph convolutional network (GCN) was used to integrate interconnected biomedical entities in the form of a knowledge graph as part of a machine learning (ML) prediction model. Data from The Pharmacogenomics Knowledgebase (PharmGKB) was used to construct a biomedical knowledge graph. Individual genetic data (n = 485,754) from the UKBB was obtained and preprocessed to match with pharmacogenetic variants in the PharmGKB. Self-reported medication usage labels were obtained from UKBB data field 20003. We hypothesize that pharmacogenetic variants can predict the impact of medications on individuals. We assume that an individual using a medication on a regular basis experiences a net benefit (vs. side-effects) from the medication. ML models were trained to predict medication usage for 264 medications. The GCN model significantly outperformed both a baseline logistic regression model (p-value: 1.53 × 10−9) and a deep neural network model (p-value: 8.68 × 10−8). The GCN model also significantly outperformed a GCN model trained using a random graph (GCN-random) (p-value: 5.44 × 10−9). A consistent trend of medications with higher sample sizes having better performance was observed, and for several medications, a high relative rank of the medication (among multiple medications) was associated with greater than 2-fold higher odds of usage of the medication. In conclusion, a graph-based ML approach could be useful in advancing precision medicine by prioritizing medications that a patient may need based on their genetic data. However, further research is needed to improve the quality and quantity of genetic data and to validate our approach using more reliable medication labels.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12189576/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12189576/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC12189576/full.md

---
Source: https://tomesphere.com/paper/PMC12189576