# Refining Drug-Induced Cholestasis Prediction: An Explainable Consensus Model Integrating Chemical and Biological Fingerprints

**Authors:** Palle S. Helmke, Gerhard F. Ecker

PMC · DOI: 10.1021/acs.jcim.4c02363 · 2025-05-27

## TL;DR

This paper presents a new computational model to predict drug-induced cholestasis using chemical and biological data, aiming to reduce animal testing in drug development.

## Contribution

The novel contribution is an explainable consensus model integrating chemical fingerprints and biological data to improve cholestasis prediction.

## Key findings

- The baseline model achieved an MCC of 0.29 and sensitivity of 0.79 using PubChem and pathway data.
- The refined model improved performance with an MCC of 0.38 and sensitivity of 0.80.
- Albumin was identified as a potential target linked to cholestasis.

## Abstract

Effective drug safety assessment, guided by the 3R principle
(Replacement,
Reduction, Refinement) to minimize animal testing, is critical in
early drug development. Drug-induced liver injury (DILI), particularly
drug-induced cholestasis (DIC), remains a major challenge. This study
introduces a computational method for predicting DIC by integrating
PubChem substructure fingerprints with biological data from liver-expressed
targets and pathways, alongside nine hepatic transporter inhibition
models. To address class imbalance in the public cholestasis data
set, we employed undersampling, a technique that constructs a small
and robust consensus model by evaluating distinct subsets. The most
effective baseline model, which combined PubChem substructure fingerprints,
pathway data and hepatic transporter inhibition predictions, achieved
a Matthews correlation coefficient (MCC) of 0.29 and a sensitivity
of 0.79, as validated through 10-fold cross-validation. Subsequently,
target prediction using four publicly available tools was employed
to enrich the sparse compound-target interaction matrix. Although
this approach showed lower sensitivity compared to experimentally
derived targets and pathways, it highlighted the value of incorporating
specific systems biology related information. Feature importance analysis
identified albumin as a potential target linked to cholestasis within
our predictive model, suggesting a connection worth further investigation.
By employing an expanded consensus model and applying probability
range filtering, the refined method achieved an MCC of 0.38 and a
sensitivity of 0.80, thereby enhancing decision-making confidence.
This approach advances DIC prediction by integrating biological and
chemical descriptors, offering a reliable and explainable model.

## Linked entities

- **Proteins:** LOC100189571 (uncharacterized LOC100189571)
- **Diseases:** drug-induced liver injury (MONDO:0005359)

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** DILI (MESH:D056486), DIC (MESH:D000081015), cholestasis (MESH:D002779)

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12152943/full.md

---
Source: https://tomesphere.com/paper/PMC12152943