# Unveiling key descriptors via machine learning: toward rational molecular design of chromophores with excited-state intramolecular proton transfer

**Authors:** Shengsheng Wei, Zipeng Yang, Chao Yang, Hongmei Zhao, Yang Li, Yuanyuan Guo, Andong Xia, Zhuoran Kuang

PMC · DOI: 10.1039/d5sc07051a · Chemical Science · 2026-02-09

## TL;DR

This paper presents a machine learning approach to design molecules with specific light-emitting properties, validated by creating and testing two new compounds.

## Contribution

A data-driven framework for predicting and designing ESIPT molecules with accurate ΔE* prediction and experimental validation.

## Key findings

- An interpretable ML model identified key H-bond descriptors influencing ΔE*.
- Two AI-designed ESIPT molecules with distinct dual emission were successfully synthesized.
- The framework accelerates high-throughput screening and molecular design of ESIPT compounds.

## Abstract

Precise design of excited-state intramolecular proton transfer (ESIPT) molecules targeting advanced optoelectronic or biological sensing applications presents a fundamental challenge. Controlling the energy difference (ΔE*) between normal (N*) and tautomeric (T*) excited-state forms is crucial, yet the complex interplay of hydrogen bond (H-bond) strength, proton donor acidity, and proton acceptor basicity with ΔE* remains insufficiently explored. Conventional trial-and-error approaches for designing tailored ESIPT compounds suffer from inefficient synthesis. To address this, we constructed a high-quality ESIPT dataset by introducing ten substituents with progressively increasing electron-donating capacity into six representative ESIPT parent scaffolds. Integrating qualitative descriptors with data-driven machine learning (ML) enabled precise ΔE* prediction, significantly accelerating high-throughput screening. An interpretable Shapley additive explanations (SHAP)-based ML approach was applied to evaluate the relative importance of key H-bond descriptors while achieving accurate ΔE* prediction. Novel ESIPT candidates were generated using a variational autoencoder (VAE) model and filtered using predicted ΔE*, synthetic accessibility (SA) scores, and pharmacokinetic properties. Critically, we synthesized two AI-designed ESIPT molecules exhibiting distinct N*/T* dual emission, which provides a closed-loop experimental validation of this data-driven molecular design strategy. This work establishes a predictive framework for accurate ΔE* determination and accelerated exploitation of novel promising ESIPT compounds.

This work introduces a data-driven, closed-loop strategy for ESIPT molecular design with ΔE* prediction, interpretation, and candidate generation. The strategy was validated by synthesizing two AI-designed molecules with distinct dual emission.

## Full-text entities

- **Chemicals:** hydrogen (MESH:D006859)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12910287/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12910287/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC12910287/full.md

---
Source: https://tomesphere.com/paper/PMC12910287