LLM-Prop: Predicting Physical And Electronic Properties Of Crystalline Solids From Their Text Descriptions
Andre Niyongabo Rubungo, Craig Arnold, Barry P. Rand, Adji Bousso, Dieng

TL;DR
This paper introduces LLM-Prop, a novel approach using large language models to predict crystal properties from text descriptions, outperforming existing graph neural network methods and providing a new benchmark dataset for the task.
Contribution
The paper presents a new benchmark dataset (TextEdge) and demonstrates that LLMs can effectively predict crystal properties from text, surpassing GNN-based methods.
Findings
LLM-Prop outperforms GNNs in predicting band gap and unit cell volume.
LLM-Prop surpasses a domain-specific BERT model despite fewer parameters.
Current GNNs struggle to capture symmetry-related information for crystal property prediction.
Abstract
The prediction of crystal properties plays a crucial role in the crystal design process. Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks (GNNs). Although GNNs are powerful, accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge. Surprisingly, predicting crystal properties from crystal text descriptions is understudied, despite the rich information and expressiveness that text data offer. One of the main reasons is the lack of publicly available data for this task. In this paper, we develop and make public a benchmark dataset (called TextEdge) that contains text descriptions of crystal structures with their properties. We then propose LLM-Prop, a method that leverages the general-purpose learning capabilities of large language models (LLMs) to predict the physical and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · X-ray Diffraction in Crystallography
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Softmax · Dense Connections · Adam · Residual Connection · WordPiece
