# Leveraging Machine Learning to Predict Warfarin Sensitivity in the Puerto Rican Population: A Pharmacogenomic Approach

**Authors:** Jorge E. Martínez-Jiménez, Yolianne Ortega-Lampón, Dylan Cedres-Rivera, Frances Heredia-Negrón, Abiel Roche-Lima, Jorge Duconge

PMC · DOI: 10.3390/ijerph23030337 · International Journal of Environmental Research and Public Health · 2026-03-07

## TL;DR

This study uses machine learning to improve warfarin dosing predictions for Puerto Ricans, addressing challenges due to their admixed genetic background.

## Contribution

The study introduces a population-specific pharmacogenomic model for Puerto Ricans using machine learning to predict warfarin sensitivity.

## Key findings

- A gradient boosting classifier achieved the highest accuracy (0.7500) in predicting warfarin sensitivity.
- The study highlights the importance of including ethno-specific genetic variants in pharmacogenomic models.
- Despite high accuracy, sensitivity for detecting warfarin-sensitive patients remains low.

## Abstract

Public health relevance—How does this work relate to a public health issue?
Warfarin therapy is associated with substantial adverse drug events and hospitalizations in older adults.Genetic and clinical heterogeneity complicates safe warfarin dosing in admixed Hispanic populations.

Warfarin therapy is associated with substantial adverse drug events and hospitalizations in older adults.

Genetic and clinical heterogeneity complicates safe warfarin dosing in admixed Hispanic populations.

Public health significance—Why is this work of significance to public health?
Many existing warfarin pharmacogenomic models have limited applicability across populations with differing genetic architectures.Incorporation of population specific genetic variation improves classification of warfarin sensitivity.

Many existing warfarin pharmacogenomic models have limited applicability across populations with differing genetic architectures.

Incorporation of population specific genetic variation improves classification of warfarin sensitivity.

Public health implications—What are the key implications or messages for practitioners, policymakers, and/or researchers?
Population-informed prediction models may improve the clinical management of anticoagulation therapy.Broader representation of genetic backgrounds is needed to enhance the generalizability of pharmacogenomic tools.

Population-informed prediction models may improve the clinical management of anticoagulation therapy.

Broader representation of genetic backgrounds is needed to enhance the generalizability of pharmacogenomic tools.

Warfarin is one of the most used oral anticoagulants, even after the arrival of non-vitamin K oral anticoagulants. Warfarin has been implicated in approximately one-third of emergency hospitalizations for adverse drug events among older adults in national U.S. data. Warfarin dose has been shown to vary between patients with up to 10 times the standard dose. This variability is due to multiple factors such as age, gender, diet, body size, co-medications, and the genetic background of the patient, where the genetic background accounts for 50% of warfarin dose variability among Europeans. Sadly, these findings do not apply to Caribbean Hispanic populations such as Puerto Ricans due to them having an admixed genetic profile. In the field of pharmacogenomics (PGx), the utility of machine learning (ML) has been used to predict individual drug responses by analyzing complex genetic and clinical data, which helps personalize medicine by tailoring treatments to a patient’s genetic makeup. Inclusion of ethno-specific variants has demonstrated improvement on the application of ML to a specific population. This study compares eight ML methods to predict warfarin sensitivity in Puerto Rican Caribbean Hispanics. This study is a secondary analysis of genetic and clinical data from 217 Puerto Rican patients treated with warfarin for thromboembolic disorders. After quality control filtering and exclusion of participant records with incomplete genetic and clinical data, 146 participants are retained for analysis. Data are divided into 65% and 35% to be used as training and test sets. Model performance is determined by comparing the precision and accuracy metrics, computed through the corresponding confusion matrixes. A gradient boosting classifier (GDB) achieves the highest overall accuracy (0.7500) and weighted precision of (0.7642); however, sensitivity for detecting warfarin-sensitive patients remains low. Feature importance analysis suggests that rs202201137 could contribute to model predictions, although overall detection of warfarin-sensitive individuals remains limited.

## Linked entities

- **Chemicals:** warfarin (PubChem CID 54678486)

## Full-text entities

- **Diseases:** thromboembolic disorders (MESH:D013923)
- **Chemicals:** vitamin K (MESH:D014812), Warfarin (MESH:D014859)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** rs202201137

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13026543/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13026543/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC13026543/full.md

---
Source: https://tomesphere.com/paper/PMC13026543