# Transferable and Transparent Energy Decomposition-Based Machine Learning Models for Computing Accurate Reaction Energetics

**Authors:** Carlos R. Jacinto-Mejía, Loriano Storchi, Giovanni Bistoni

PMC · DOI: 10.1021/acs.jctc.5c01184 · Journal of Chemical Theory and Computation · 2025-10-29

## TL;DR

This paper introduces a machine learning framework that improves the accuracy of reaction energy calculations using interpretable energy decomposition methods.

## Contribution

A transferable and interpretable ML framework using energy decomposition to enhance DFT reaction energies with high accuracy and robust generalization.

## Key findings

- A general-purpose model reduces MAPE by up to 63% compared to uncorrected DFT.
- A random forest classifier achieves a MAPE reduction of up to 123 percentage points.
- The framework maintains robust performance on out-of-distribution transition-metal complex data.

## Abstract

We present a transferable, interpretable, and modular
machine-learning
framework that enhances the accuracy of density functional theory
(DFT) reaction energies using physically meaningful energy-decomposition
descriptors. Reaction energies computed at the DFT level with standard
basis sets are first decomposed into chemically intuitive contributionssuch
as kinetic and potential energywhich are then used to train
a library of linear regression (LR) models. This includes a general-purpose
model that reduces mean absolute percentage errors (MAPE) relative
to gold standard CCSD­(T)/CBS reference values by up to 63% compared
to uncorrected DFT across extended benchmark sets. In parallel, a
series of specialized LR models provide improved accuracy for specific
reaction classes. A random forest (RF) classifier dynamically selects
the optimal model for each case, pushing accuracy further and achieving
a
MAPE reduction of up to 123 percentage points, all while maintaining
full model interpretability. In a rigorous out-of-distribution stress
test on the WCCR10 data setcontaining transition-metal complexes
absent from trainingboth the general LR model and the RF/LR
pipeline retain robust performance. Unlike typical neural network
models, which often face generalization challenges beyond their training
set, our framework maintains stable performance outside its training
domain.

## Full-text entities

- **Diseases:** BSIE (MESH:D020920), LARGE (MESH:D018287), functional deficiencies (MESH:D003291), SMALL (MESH:D018288)
- **Chemicals:** LP14 (-), HF (MESH:D006195), metal (MESH:D008670)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12613314/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12613314/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC12613314/full.md

---
Source: https://tomesphere.com/paper/PMC12613314