Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering
Hongxuan Liu, Haoyu Yin, Zhiyao Luo, Xiaonan Wang

TL;DR
This study demonstrates that integrating domain-specific chemical knowledge into prompt engineering significantly improves large language models' accuracy and relevance in scientific tasks, enabling better scientific discovery.
Contribution
The paper introduces a novel domain-knowledge embedded prompt engineering method and a curated benchmark dataset for chemical and biological scientific domains.
Findings
Enhanced model performance on chemical and biological tasks
Reduced hallucinations in LLM outputs
Successful case studies on complex materials
Abstract
This paper presents a study on the integration of domain-specific knowledge in prompt engineering to enhance the performance of large language models (LLMs) in scientific domains. A benchmark dataset is curated to encapsulate the intricate physical-chemical properties of small molecules, their drugability for pharmacology, alongside the functional attributes of enzymes and crystal materials, underscoring the relevance and applicability across biological and chemical domains.The proposed domain-knowledge embedded prompt engineering method outperforms traditional prompt engineering strategies on various metrics, including capability, accuracy, F1 score, and hallucination drop. The effectiveness of the method is demonstrated through case studies on complex materials including the MacMillan catalyst, paclitaxel, and lithium cobalt oxide. The results suggest that domain-knowledge prompts can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
