Benchmarking Large Language Models for Molecule Prediction Tasks
Zhiqiang Zhong, Kuangyu Zhou, Davide Mottin

TL;DR
This paper evaluates the effectiveness of Large Language Models in molecule prediction tasks, comparing their performance with specialized ML models, and explores how LLMs can complement existing methods despite current limitations.
Contribution
It provides a systematic assessment of LLMs on molecule prediction tasks and discusses potential ways to leverage LLMs alongside traditional models.
Findings
LLMs generally underperform compared to specialized ML models for molecule tasks.
LLMs can improve ML model performance when used collaboratively.
Challenges include LLMs' limited understanding of graph-structured data.
Abstract
Large Language Models (LLMs) stand at the forefront of a number of Natural Language Processing (NLP) tasks. Despite the widespread adoption of LLMs in NLP, much of their potential in broader fields remains largely unexplored, and significant limitations persist in their design and implementation. Notably, LLMs struggle with structured data, such as graphs, and often falter when tasked with answering domain-specific questions requiring deep expertise, such as those in biology and chemistry. In this paper, we explore a fundamental question: Can LLMs effectively handle molecule prediction tasks? Rather than pursuing top-tier performance, our goal is to assess how LLMs can contribute to diverse molecule tasks. We identify several classification and regression prediction tasks across six standard molecule datasets. Subsequently, we carefully design a set of prompts to query LLMs on these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Genetics, Bioinformatics, and Biomedical Research
MethodsSparse Evolutionary Training
