# Multi-task deep learning models for mechanism-based prediction of developmental and reproductive toxicity (DART) using ToxCast bioassays

**Authors:** Siyeol Ahn, Hojun Jung, Jinwon Hwang, Donghyeon Kim, Hyunjun Kim, Wooseok Kim, Yunjung Lee, Changwon Lim, Jinhee Choi

PMC · DOI: 10.3389/ftox.2026.1751644 · Frontiers in Toxicology · 2026-02-04

## TL;DR

This paper introduces a deep learning model that predicts developmental and reproductive toxicity using in vitro data, offering a faster and ethical alternative to animal testing.

## Contribution

A novel mechanism-based deep learning framework for DART prediction using ToxCast bioassays, outperforming traditional methods.

## Key findings

- The DGCL model outperformed baseline machine learning algorithms in predicting DART.
- Multi-task learning improved model performance for endpoints with limited data.
- External validation showed reliable predictive performance (F1 = 0.68).

## Abstract

Developmental and reproductive toxicity (DART) testing has traditionally relied on animal studies, which are costly, time-consuming, and ethically constrained. To advance new approach methodologies (NAMs), we developed a mechanism-informed deep learning framework for predicting DART using in vitro bioactivity data from 23 ToxCast assays mechanistically linked to key developmental and reproductive pathways. Four state-of-the-art (SOTA) deep learning architectures (DGCL, TransFoxMol, MolPath, and MolFormer) were evaluated to address performance limitations commonly observed in traditional supervised learning approaches. Each model was fine-tuned using the curated ToxCast dataset, with the F1 score serving as the primary evaluation metric. Among these, the DGCL model consistently outperformed baseline machine learning algorithms, including random forest, XGB, GBT, decision tree, and logistic regression. Extending DGCL to a multi-task learning framework further improved model stability and performance for endpoints with limited active data. External validation with 91 reference chemicals curated and verified by the ECVAM ReProTect program demonstrated balanced predictive performance (F1 = 0.68), confirming the reliability and generalizability of the fine-tuned DGCL model. By leveraging advanced deep learning architectures, the model effectively handles mechanistically diverse and imbalanced assay data with limited active samples, resulting in improved predictive performance across DART-related effects. Overall, this study demonstrates the potential of integrating mechanistic bioassay information with deep learning to develop reliable, mechanism-based, and non-animal methods for DART prediction and potential regulatory application.

## Full-text entities

- **Genes:** PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}, EREG (epiregulin) [NCBI Gene 2069] {aka EPR, ER, Ep}, ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}, PLAUR (plasminogen activator, urokinase receptor) [NCBI Gene 5329] {aka CD87, U-PAR, UPAR, URKR}
- **Diseases:** cardiotoxicity (MESH:D066126), DART (MESH:D060737), DL (MESH:D007859), toxicity (MESH:D064420), endocrine (MESH:D004700)
- **Chemicals:** ICI182780 (MESH:D000077267), salts (MESH:D012492), Steroid (MESH:D013256), AD (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Danio rerio (leopard danio, species) [taxon 7955]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12912713/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12912713/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC12912713/full.md

---
Source: https://tomesphere.com/paper/PMC12912713