What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
Taicheng Guo, Kehan Guo, Bozhao Nan, Zhenwen Liang, Zhichun Guo,, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang

TL;DR
This paper evaluates the capabilities of large language models in chemistry across eight tasks, revealing GPT-4's superior performance and highlighting current limitations of LLMs in practical chemistry applications.
Contribution
It establishes a comprehensive benchmark for LLMs in chemistry, analyzing performance across multiple tasks and providing insights into their strengths and limitations.
Findings
GPT-4 outperforms other models in chemistry tasks
LLMs show varying performance levels across tasks
In-context learning impacts LLMs' effectiveness in chemistry
Abstract
Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper, rather than pursuing state-of-the-art performance, we aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain. We identify three key chemistry-related capabilities including understanding, reasoning and explaining to explore in LLMs and establish a benchmark containing eight chemistry tasks. Our analysis draws on widely recognized datasets facilitating a broad exploration of the capacities of LLMs within the context of practical chemistry. Five LLMs (GPT-4, GPT-3.5, Davinci-003, Llama and Galactica) are evaluated for each chemistry task in zero-shot and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer · Label Smoothing · Absolute Position Encodings · Cosine Annealing · Transformer
