ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis
Zhiling Zheng, Oufan Zhang, Christian Borgs, Jennifer T. Chayes, Omar, M. Yaghi

TL;DR
This paper presents a ChatGPT-based system that automates text mining of MOF synthesis data from scientific literature, achieving high accuracy and enabling predictive modeling and a chemistry chatbot without coding.
Contribution
The authors developed a prompt-engineered ChatGPT workflow for extracting MOF synthesis parameters, improving data accuracy and enabling predictive and interactive tools in chemistry.
Findings
Achieved 90-99% precision, recall, and F1 scores in text mining.
Built a dataset of 26,257 synthesis parameters for ~800 MOFs.
Created a predictive model with over 86% accuracy for MOF crystallization outcomes.
Abstract
We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic frameworks (MOFs) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information -- an issue that previously made the use of Large Language Models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different tradeoffs between labor, speed, and accuracy. We deploy this system to extract 26,257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods
