Integrating Chemistry Knowledge in Large Language Models via Prompt   Engineering

Hongxuan Liu; Haoyu Yin; Zhiyao Luo; Xiaonan Wang

arXiv:2404.14467·cs.CL·April 24, 2024

Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering

Hongxuan Liu, Haoyu Yin, Zhiyao Luo, Xiaonan Wang

PDF

Open Access 1 Repo

TL;DR

This study demonstrates that integrating domain-specific chemical knowledge into prompt engineering significantly improves large language models' accuracy and relevance in scientific tasks, enabling better scientific discovery.

Contribution

The paper introduces a novel domain-knowledge embedded prompt engineering method and a curated benchmark dataset for chemical and biological scientific domains.

Findings

01

Enhanced model performance on chemical and biological tasks

02

Reduced hallucinations in LLM outputs

03

Successful case studies on complex materials

Abstract

This paper presents a study on the integration of domain-specific knowledge in prompt engineering to enhance the performance of large language models (LLMs) in scientific domains. A benchmark dataset is curated to encapsulate the intricate physical-chemical properties of small molecules, their drugability for pharmacology, alongside the functional attributes of enzymes and crystal materials, underscoring the relevance and applicability across biological and chemical domains.The proposed domain-knowledge embedded prompt engineering method outperforms traditional prompt engineering strategies on various metrics, including capability, accuracy, F1 score, and hallucination drop. The effectiveness of the method is demonstrated through case studies on complex materials including the MacMillan catalyst, paclitaxel, and lithium cobalt oxide. The results suggest that domain-knowledge prompts can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

harrylaucngd/prompt-eng-master
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies