Benchmarking Large Language Models for Molecule Prediction Tasks

Zhiqiang Zhong; Kuangyu Zhou; Davide Mottin

arXiv:2403.05075·cs.LG·March 11, 2024·6 cites

Benchmarking Large Language Models for Molecule Prediction Tasks

Zhiqiang Zhong, Kuangyu Zhou, Davide Mottin

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the effectiveness of Large Language Models in molecule prediction tasks, comparing their performance with specialized ML models, and explores how LLMs can complement existing methods despite current limitations.

Contribution

It provides a systematic assessment of LLMs on molecule prediction tasks and discusses potential ways to leverage LLMs alongside traditional models.

Findings

01

LLMs generally underperform compared to specialized ML models for molecule tasks.

02

LLMs can improve ML model performance when used collaboratively.

03

Challenges include LLMs' limited understanding of graph-structured data.

Abstract

Large Language Models (LLMs) stand at the forefront of a number of Natural Language Processing (NLP) tasks. Despite the widespread adoption of LLMs in NLP, much of their potential in broader fields remains largely unexplored, and significant limitations persist in their design and implementation. Notably, LLMs struggle with structured data, such as graphs, and often falter when tasked with answering domain-specific questions requiring deep expertise, such as those in biology and chemistry. In this paper, we explore a fundamental question: Can LLMs effectively handle molecule prediction tasks? Rather than pursuing top-tier performance, our goal is to assess how LLMs can contribute to diverse molecule tasks. We identify several classification and regression prediction tasks across six standard molecule datasets. Subsequently, we carefully design a set of prompts to query LLMs on these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiqiangzhongddu/llmamol
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Genetics, Bioinformatics, and Biomedical Research

MethodsSparse Evolutionary Training