From Words to Molecules: A Survey of Large Language Models in Chemistry
Chang Liao, Yemin Yu, Yu Mei, Ying Wei

TL;DR
This survey reviews how large language models are adapted and applied to chemistry, discussing input representations, domain-specific training, applications, and future research directions in this interdisciplinary field.
Contribution
It categorizes chemical LLMs based on input modalities, analyzes their training objectives, and explores diverse applications, providing a comprehensive overview of current methodologies and future prospects.
Findings
Chemical LLMs are categorized into three groups based on input data modality.
Various input representation and tokenization methods are employed for molecular data.
Promising research directions include integrating chemical knowledge and improving interpretability.
Abstract
In recent years, Large Language Models (LLMs) have achieved significant success in natural language processing (NLP) and various interdisciplinary areas. However, applying LLMs to chemistry is a complex task that requires specialized domain knowledge. This paper provides a thorough exploration of the nuanced methodologies employed in integrating LLMs into the field of chemistry, delving into the complexities and innovations at this interdisciplinary juncture. Specifically, our analysis begins with examining how molecular information is fed into LLMs through various representation and tokenization methods. We then categorize chemical LLMs into three distinct groups based on the domain and modality of their input data, and discuss approaches for integrating these inputs for LLMs. Furthermore, this paper delves into the pretraining objectives with adaptations to chemical LLMs. After that,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · History and advancements in chemistry · Topic Modeling
