A Transformer Based Generative Chemical Language AI Model for Structural Elucidation of Organic Compounds
Xiaofeng Tan

TL;DR
This paper introduces a transformer-based AI model that rapidly and accurately predicts the structures of organic compounds from spectroscopic data, replacing traditional expert systems with a more efficient end-to-end approach.
Contribution
The study presents a novel transformer-based generative model for chemical structure elucidation, achieving high accuracy and speed, and demonstrating its potential to revolutionize spectroscopic analysis workflows.
Findings
Achieves 83% top-15 accuracy in structural predictions.
Generates structures within seconds on a CPU.
Trained on over 102,000 spectra from IR, UV, and NMR data.
Abstract
For over half a century, computer-aided structural elucidation systems (CASE) for organic compounds have relied on complex expert systems with explicitly programmed algorithms. These systems are often computationally inefficient for complex compounds due to the vast chemical structural space that must be explored and filtered. In this study, we present a proof-of-concept transformer based generative chemical language artificial intelligence (AI) model, an innovative end-to-end architecture designed to replace the logic and workflow of the classic CASE framework for ultra-fast and accurate spectroscopic-based structural elucidation. Our model employs an encoder-decoder architecture and self-attention mechanisms, similar to those in large language models, to directly generate the most probable chemical structures that match the input spectroscopic data. Trained on ~ 102k IR, UV, and 1H…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods
