# Cut-SOAP: A Machine Learning Descriptor for Rapid Screening of Molecular Adsorption Energetics

**Authors:** Felipe V. Calderan, Karla F. Andriani, Priscilla Felício-Sousa, Gabriel A. Pinheiro, Juarez L. F. Da Silva, Marcos G. Quiles

PMC · DOI: 10.1021/acsomega.5c10055 · 2026-01-26

## TL;DR

This paper introduces Cut-SOAP, a machine learning method that quickly and accurately predicts molecular adsorption energies at a fraction of the computational cost of traditional methods.

## Contribution

The novel Cut-SOAP descriptor significantly reduces feature dimensionality while maintaining accuracy for adsorption energy prediction.

## Key findings

- Cut-SOAP reduces feature dimensionality by over 97% without significant loss of data quality.
- The deep neural network model achieves a mean absolute error below 0.1 eV on standard test sets.
- The model maintains robust performance with a mean absolute error below 1.0 eV on out-of-distribution data.

## Abstract

Adsorption energy is a fundamental property in catalysis
and chemical
reaction studies; however, conventional quantum chemistry methods,
such as density functional theory, provide high accuracy but are often
computationally expensive or even impractical for screening large
data sets or complex chemical systems. In this work, we proposed a
machine learning (ML) pipeline that efficiently predicts relative
energy interactions for molecular adsorption near the minimum molecule–cluster
distance, at a fraction of the computational cost of quantum chemistry-based
methods. Our approach begins by transforming the Fritz–Haber
Institute ab initio materials simulation (FHI-aims) output data into
feature arrays through a modified version of the Smooth Overlap of
Atomic Positions (SOAP) descriptor, which we call Cut-SOAP. This modification
reduces the dimensionality of the features by more than 97% while
preserving most of the inherent quality of the data. With this method,
we construct a large adsorption data set of more than 430,000 entries
using real-world data. Then, a deep neural network was trained on
this data set, analyzing the influence of architectural and hyperparameter
choices on both computational cost and predictive accuracy. The model
achieved a mean absolute error below 0.1 eV in the standard test set.
To rigorously assess its generalization for real-world applications,
we evaluated it on a challenging out-of-distribution data set, where
it maintained a robust mean absolute error below 1.0 eV. The trained
model is capable of making thousands of predictions in seconds, demonstrating
the effectiveness of the pipeline for rapid screening. These results
highlight the benefits of ML-based approaches for material screening,
which offer accessible, efficient, and accurate tools for predicting
relative energy interactions. This capability is a crucial step toward
the accelerated discovery and optimization of catalytic systems.

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12902971/full.md

---
Source: https://tomesphere.com/paper/PMC12902971