# Prediction of Permeability and Efflux Using Multitask Learning

**Authors:** Philip Ivers Ohlsson, Gian Marco Ghiandoni, Susanne Winiwarter, Rocío Mercado, Vigneshwari Subramanian

PMC · DOI: 10.1021/acsomega.5c04861 · 2025-11-05

## TL;DR

This paper explores using multitask learning to predict drug permeability and efflux with greater accuracy than single-task models.

## Contribution

The novel use of multitask graph neural networks with molecular features improves permeability and efflux prediction accuracy.

## Key findings

- Multitask learning models outperform single-task models in predicting permeability and efflux.
- Adding molecular features like pKa and LogD enhances model accuracy.
- Larger internal datasets improve statistical power and model consistency.

## Abstract

In silico prediction of cell membrane
permeability
is crucial in drug discovery, since a compound’s permeation
through membranes influences parameters such as its in vivo efficacy,
bioavailability, and pharmacokinetics. This study investigates the
use of multitask graph neural networks to predict a selection of permeability-related
endpoints. The study utilized a harmonized, single-laboratory internal
data set of over 10K compounds measured in human colorectal adenocarcinoma
(Caco-2) and Madin–Darby canine kidney (MDCK) cell lines, routinely
employed in experimental assays for drug permeability and efflux.
This data set is an order of magnitude larger than comparable public
collections, thus providing greater statistical power and a consistent
error profile for model development. A series of multitask learning
(MTL) models trained on such data were benchmarked against single-task
approaches and evaluated on an external public data set to investigate
the model’s applicability domain. The comparison between the
performance of single- and multitask models suggests that MTL can
achieve higher accuracy by leveraging shared information across endpoints.
MTL is also shown to perform better when augmented with molecular
features. In particular, the inclusion of pKa and LogD, is shown to improve the accuracy of both permeability
and efflux endpoints. This work presents benchmarking results of models
utilizing different data splitting strategies, accompanied by guidelines
for optimal validation in the context of MTL.

## Linked entities

- **Diseases:** colorectal adenocarcinoma (MONDO:0005008)
- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Diseases:** colorectal adenocarcinoma (MESH:D003110)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** MDCK — Canis lupus familiaris (Dog), Spontaneously immortalized cell line (CVCL_0422), Caco-2 — Homo sapiens (Human), Colon adenocarcinoma, Cancer cell line (CVCL_0025)

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12631307/full.md

---
Source: https://tomesphere.com/paper/PMC12631307