DiSE: A diffusion probabilistic model for automatic structure elucidation of organic compounds
Haochen Chen, Qi Huang, Anan Wu, Wenhao Zhang, Jianliang Ye, Jianming Wu, Kai Tan, Xin Lu, Xin Xu

TL;DR
DiSE is a diffusion-based generative model that integrates multiple spectroscopic data types to automate and improve the accuracy of organic compound structure elucidation, advancing autonomous chemical analysis.
Contribution
We introduce DiSE, a novel diffusion model that combines various spectroscopic modalities for automated and accurate structure determination of organic molecules.
Findings
Achieves superior accuracy in structure elucidation
Demonstrates strong generalization across diverse datasets
Robust to experimental data despite training on calculated spectra
Abstract
Automatic structure elucidation is essential for self-driving laboratories as it enables the system to achieve truly autonomous. This capability closes the experimental feedback loop, ensuring that machine learning models receive reliable structure information for real-time decision-making and optimization. Herein, we present DiSE, an end-to-end diffusion-based generative model that integrates multiple spectroscopic modalities, including MS, 13C and 1H chemical shifts, HSQC, and COSY, to achieve automated yet accurate structure elucidation of organic compounds. By learning inherent correlations among spectra through data-driven approaches, DiSE achieves superior accuracy, strong generalization across chemically diverse datasets, and robustness to experimental data despite being trained on calculated spectra. DiSE thus represents a significant advance toward fully automated structure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
