# Assessing the performance of quantum-mechanical descriptors in physicochemical and biological property prediction

**Authors:** Alejandra Hinostroza Caldas, Artem Kokorin, Alexandre Tkatchenko, Leonardo Medrano Sandonas

PMC · DOI: 10.1039/d5dd00411j · Digital Discovery · 2026-01-19

## TL;DR

This paper introduces a new machine learning framework called QUED that combines structural and electronic molecular data to improve predictions of drug properties.

## Contribution

The QUED framework integrates quantum-mechanical and geometric descriptors to enhance ML model accuracy and interpretability for property prediction.

## Key findings

- QM descriptors improve ML model accuracy for physicochemical property prediction.
- Electronic features like molecular orbital energies are key predictors for toxicity and lipophilicity.
- QUED shows practical value for drug-like molecules using the QM7-X and TDCommons-LD50 datasets.

## Abstract

Machine learning (ML) approaches have drastically advanced the exploration of structure–property and property–property relationships in computer-aided drug discovery. A central challenge in this field is the identification of molecular descriptors that can effectively capture both geometric- and electronic structure-derived features, enabling the development of reliable and interpretable predictive models. While numerous descriptors focusing solely on structural characteristics have been recently proposed, improvements in model accuracy often come at the cost of increased computational demands, thereby restricting their practical applicability. To address this challenge, we introduce the “QUantum Electronic Descriptor” (QUED) framework, which integrates both structural and electronic data of molecules to develop ML regression models for property prediction. In doing so, a quantum-mechanical (QM) descriptor is derived from molecular and atomic properties computed using the semi-empirical density functional tight-binding (DFTB) method, which allows for efficient modelling of both small and large drug-like molecules. This descriptor is combined with inexpensive geometric descriptors—capturing two-body and three-body interatomic interactions—to form comprehensive molecular representations used to train Kernel Ridge Regression and XGBoost models. As a proof of concept, we validate QUED using the QM7-X dataset, which comprises equilibrium and non-equilibrium conformations of small drug-like molecules, demonstrating that incorporating electronic structure data notably enhances the accuracy of ML models for predicting physicochemical properties. For biological endpoints, we find that QM properties provide some predictive value for toxicity and lipophilicity prediction, as assessed using the TDCommons-LD50 and the MoleculeNet benchmark datasets. Moreover, a SHapley Additive exPlanations (SHAP) analysis of the toxicity and lipophilicity predictive models reveals that molecular orbital energies and DFTB energy components are among the most influential electronic features. Hence, our work underscores the importance of incorporating QM descriptors to enhance both the accuracy and interpretability of ML models for predicting multiple properties relevant to pharmaceutical and biological applications.

QUED, a QM/ML framework that combines structural and electronic molecular information to build regression models for physicochemical and biological property prediction. Our work highlights the value of QM data for reliable and interpretable models.

## Full-text entities

- **Diseases:** toxicity (MESH:D064420)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12820757/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12820757/full.md

## References

98 references — full list in the complete paper: https://tomesphere.com/paper/PMC12820757/full.md

---
Source: https://tomesphere.com/paper/PMC12820757