# Advancing chemical safety prediction: an integrated GNN framework with DFT-augmented cyclic compound solution

**Authors:** Seul Lee, Jooyeon Lee, Unghwi Yoon, Jahyun Koo, Young Wook Yoon, Yoonjae Cho, Seung-Ryul Hwang, Keunhong Jeong

PMC · DOI: 10.1186/s13321-026-01151-3 · Journal of Cheminformatics · 2026-01-28

## TL;DR

This paper introduces a machine learning system that predicts key chemical safety properties, improving accuracy for challenging cyclic compounds and enabling real-time use in emergency scenarios.

## Contribution

A novel GNN-based framework with a hybrid DFT and Random Forest approach for cyclic compounds, integrated into a real-time prediction system.

## Key findings

- The model achieved high accuracy with MAEs of 126 J/mol for HoC, 0.617 log units for VP, and 14.42°C for Flashpoint.
- A hybrid approach improved HoC prediction for cyclic compounds, achieving R2 of 0.918 on 55 compounds.
- The system enables real-time prediction and comparison with industrial benchmarks for practical emergency response use.

## Abstract

The rapid proliferation of chemical substances presents significant challenges in assessing their safety–critical physicochemical properties. This study presents an integrated approach using Graph Neural Networks (GNNs) to predict three crucial properties for chemical safety assessment: Heat of Combustion (HoC), Vapor Pressure (VP), and Flashpoint. Leveraging comprehensive datasets of 4780, 3573, and 14,696 compounds respectively, we developed a unified prediction model that outperforms existing approaches. Our model achieves mean absolute errors of 126 J/mol (R2 = 0.993) for HoC, 0.617 log units (R2 = 0.898) for VP, and 14.42 °C (R2 = 0.839) for Flashpoint, representing notable improvements over conventional methods. Through detailed analysis, we identified and addressed a specific challenge in predicting HoC for cyclic compounds by implementing a hybrid approach combining DFT calculations and Random Forest modeling. This specialized treatment expanded our cyclic compound dataset from 12 to 55 compounds and achieved an R2 of 0.918 for these traditionally challenging structures. The model was integrated into a real-time prediction system using Flask, allowing users to input chemical structures through SMILES notation or direct drawing. The system includes features for comparing predictions with experimental data and benchmarking against common industrial chemicals (acetone, n-hexane, and n-decane), enhancing its practical utility in emergency response scenarios. Our approach provides a robust, unified solution for predicting multiple safety–critical properties simultaneously, addressing a crucial need in chemical safety assessment and emergency response planning.

Overall, this study provides an integrated framework that deploys three GNN-based prediction models within a common architecture and a real-time prediction system. For cyclic compounds, which exhibit systematic prediction challenges under the GNN framework, we incorporate a targeted alternative modeling strategy to improve predictive reliability, thereby enhancing the practical applicability of machine-learning approaches to chemical safety assessment.

The online version contains supplementary material available at 10.1186/s13321-026-01151-3.

## Linked entities

- **Chemicals:** acetone (PubChem CID 180), n-hexane (PubChem CID 8058), n-decane (PubChem CID 15600)

## Full-text entities

- **Chemicals:** GNN (-), n-hexane (MESH:C026385), n-decane (MESH:C012867), acetone (MESH:D000096)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12922296/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12922296/full.md

## References

5 references — full list in the complete paper: https://tomesphere.com/paper/PMC12922296/full.md

---
Source: https://tomesphere.com/paper/PMC12922296