# Establishment of an Integrated Model for Predicting Compound Mutagenicity with a Feature Importance Analysis

**Authors:** Chao-Hsu Yang, Tony Eight Lin, Jui-Hua Hsieh, Kai-Cheng Hsu, Pei-Te Chiueh

PMC · DOI: 10.1021/acs.jcim.5c01586 · Journal of Chemical Information and Modeling · 2025-10-21

## TL;DR

This study creates a deep learning model to predict if chemicals are mutagenic, using molecular features and finding that nitrogen and ring structures are linked to mutagenicity.

## Contribution

The novel contribution is an integrated deep learning framework combining diverse molecular features for mutagenicity prediction with feature importance analysis.

## Key findings

- The MACCS-Mordred model achieved a balanced accuracy of 0.885 and precision of 0.922 in predicting compound mutagenicity.
- Feature importance analysis showed nitrogen-containing and ring-related substructures are associated with mutagenic risk.
- Applicability domain analysis confirmed the model's robustness for most compounds in the dataset.

## Abstract

Assessing the mutagenicity of chemical compounds is crucial
for
ensuring their safety and minimizing potential environmental and public
health risks. However, traditional mutagenicity assessments, such
as the Ames test, are time-consuming, resource-intensive, and often
limited in their capacity to screen a large number of compounds. To
address this gap, predictive models powered by deep learning offer
a promising alternative for rapid and cost-effective mutagenicity
screening. In this study, we propose an integrated deep learning framework
utilizing diverse molecular features to predict compound mutagenicity.
In the total usage of 5866 compounds, 5279 compounds were utilized
for model training, and the other 587 compounds were utilized for
model evaluation. A total of 78 integrated models were developed by
systematically combining 13 types of molecular descriptors and fingerprints.
The MACCS-Mordred model demonstrated the best performance, achieving
a balanced accuracy of 0.885 and a precision score of 0.922 in the
testing data set. In addition, we performed an activity cliff analysis
to examine potential sources of mispredictions. Applicability domain
analysis further confirmed the robustness of the model, indicating
that most compounds in our data set fell within the reliable prediction
space. Notably, feature importance analysis revealed that mutagenic
compounds are more likely to contain nitrogen-containing and ring-related
substructures, offering insights into structural characteristics associated
with mutagenic risk. Our results support AI-enabled screening tools
for prioritizing hazardous compounds and improving early stage chemical
risk assessment. This work provides practical value for environmental
monitoring and regulatory decision-making.

## Full-text entities

- **Chemicals:** nitrogen (MESH:D009584)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12606639/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12606639/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC12606639/full.md

---
Source: https://tomesphere.com/paper/PMC12606639