# Predicting electronic properties of molecules: a stacking ensemble model for HOMO and LUMO energy estimation

**Authors:** Omid Mahmoudi, Mi-hyun Kim

PMC · DOI: 10.1039/d5ra08007j · RSC Advances · 2026-03-04

## TL;DR

This paper introduces a machine learning model that accurately predicts molecular electronic properties, such as HOMO and LUMO energies, using a stacking ensemble approach.

## Contribution

The novel contribution is the development of HLP-Stack, a stacking ensemble model that outperforms individual models in predicting HOMO and LUMO energies.

## Key findings

- HLP-Stack achieved high predictive performance with R² ≈ 0.999 and low RMSE for both HOMO and LUMO energies.
- Feature selection identified key descriptors that influence HOMO and LUMO energies without trivial encoding.
- SHAP analysis and molecular topology analysis revealed how structural features affect electronic properties.

## Abstract

The energies of the Highest Occupied Molecular Orbital (HOMO) and Lowest Unoccupied Molecular Orbital (LUMO) are important determinants of molecular reactivity and stability. Traditional quantum chemical (QC) methods for calculating the HOMO and LUMO energies face drawbacks, including their costliness and long computational time. Therefore, machine learning is well-positioned to act as a catalyst for QC predictions. In this study, we report a stacking ensemble model named HLP-Stack (HOMO–LUMO predictor via stacking) to predict the energy values using molecular descriptors and the QM9 dataset. The stacking achieved robust predictive performance superior to any single model by combining 2D/3D descriptors of the QM9 dataset. It achieved high predictive performance on the test set (R2 ≈ 9.999 × 10−1, RMSE ≈ 3.219 × 10−4 Hartree (Eh) for HOMO; R2 ≈ 9.999 × 10−1, RMSE ≈ 1.903 × 10−4Eh for LUMO), outperforming individual baseline models. Feature selection using the SelectKBest algorithm with mutual information regression identified the most influential descriptors. To ensure these descriptors did not trivially encode HOMO or LUMO energies, we performed correlation analysis between each descriptor and the target properties. SHAP Tree Explainer analysis further revealed the feature contribution of each feature to model predictions. In addition, analysis of molecular topology and functional groups highlighted trends in aromaticity and ring structures, and their impact on electronic behavior. Finally, HOMO–LUMO gap analysis demonstrated how molecular structure and functionalization affect electronic properties.

A stacking ensemble model for HOMO and LUMO energy estimation.

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** ML (MESH:D007859)
- **Chemicals:** alcohol (MESH:D000438), Hydrogens (MESH:D006859), halogens (MESH:D006219), alkyne (MESH:D000480), anthracene (MESH:C034020), amine (MESH:D000588), -CnHm (-), ethers (MESH:D004987), alkane (MESH:D000473), naphthalene (MESH:C031721), PAH (MESH:D011084), benzene (MESH:D001554), C (MESH:D002244)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12959570/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12959570/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/PMC12959570/full.md

---
Source: https://tomesphere.com/paper/PMC12959570