DARWIN 1.5: Large Language Models as Materials Science Adapted Learners
Tong Xie, Yuwei Wan, Yixuan Liu, Yuchen Zeng, Shaozhou Wang, Wenjie Zhang, Clara Grazian, Chunyu Kit, Wanli Ouyang, Dongzhan Zhou, Bram Hoex

TL;DR
DARWIN 1.5 leverages large language models trained on extensive materials science literature and datasets to improve material property prediction and discovery without relying on traditional descriptors, demonstrating superior accuracy and transferability.
Contribution
This work introduces DARWIN 1.5, the largest open-source LLM for materials science, integrating diverse data sources to enable flexible, descriptor-free material property prediction and discovery.
Findings
Achieves up to 59.1% improvement in prediction accuracy.
Outperforms state-of-the-art machine learning methods across 8 tasks.
Integrates 6 million papers and 21 datasets for cross-task knowledge transfer.
Abstract
Materials discovery and design aim to find compositions and structures with desirable properties over highly complex and diverse physical spaces. Traditional solutions, such as high-throughput simulations or machine learning, often rely on complex descriptors, which hinder generalizability and transferability across different material systems. Moreover, These descriptors may inadequately represent macro-scale material properties, which are influenced by structural imperfections and compositional variations in real-world samples, thus limiting their practical applicability. To address these challenges, we propose DARWIN 1.5, the largest open-source large language model tailored for materials science. By leveraging natural language as input, DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
MethodsBalanced Selection
