Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning
Juejing Liu, Haydn Anderson, Noah I. Waxman, Vsevolod Kovalev, Byron Fisher, Elizabeth Li, Xiaofeng Guo

TL;DR
This paper demonstrates how large language models can automate literature reviews and how machine learning models can predict thermodynamic properties, significantly accelerating chemistry and materials science research.
Contribution
It introduces an LLM-based tool for extracting chemical data and trains an ML model to predict thermodynamic parameters, showcasing automation in data collection and prediction.
Findings
Successful extraction of chemical information from diverse literature sources
Accurate prediction of thermodynamic parameters using ML models
Demonstration of integrated ML approaches transforming research workflows
Abstract
New discoveries in chemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerating research efficiency. Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters). Our LLM-based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine-readable structure, including stability constants for metal cation-ligand interactions, thermodynamic properties, and other broader data types (medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Crystallography and molecular interactions
