CACTUS: Chemistry Agent Connecting Tool-Usage to Science

Andrew D. McNaughton; Gautham Ramalaxmi; Agustin Kruel; Carter R.; Knutson; Rohith A. Varikoti; Neeraj Kumar

arXiv:2405.00972·cs.CL·October 30, 2024·5 cites

CACTUS: Chemistry Agent Connecting Tool-Usage to Science

Andrew D. McNaughton, Gautham Ramalaxmi, Agustin Kruel, Carter R., Knutson, Rohith A. Varikoti, Neeraj Kumar

PDF

Open Access 1 Repo

TL;DR

CACTUS is an innovative LLM-based agent that integrates cheminformatics tools to enhance reasoning and problem-solving in chemistry, outperforming baseline models and enabling advanced molecular discovery tasks.

Contribution

This paper introduces CACTUS, a novel framework combining open-source LLMs with domain-specific tools for improved chemistry research and molecular discovery.

Findings

01

CACTUS significantly outperforms baseline LLMs on chemistry questions.

02

Prompt engineering and hardware configurations impact model performance.

03

Smaller models can be effectively deployed on consumer hardware without major accuracy loss.

Abstract

Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b and Mistral-7b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pnnl/cactus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVarious Chemistry Research Topics

MethodsSparse Evolutionary Training