Scale-Dependent Input Representation and Confidence Estimation for LLMs in Materials Property Prediction
Shuichiro Ozawa, Izumi Takahara, Teruyasu Mizoguchi

TL;DR
This study evaluates how input representation and model scale affect LLM performance in materials property prediction, and proposes mean NLL as a confidence measure.
Contribution
It systematically compares input representations across different model scales and introduces mean NLL as a practical confidence estimator.
Findings
Optimal input representation depends on model scale.
Crystal summaries with space-group info outperform composition-only inputs.
Lower mean NLL correlates with smaller prediction errors in fine-tuned models.
Abstract
Large language models (LLMs) are increasingly applied to materials science. However, the relationship between prediction accuracy, input representation, and model scale remains unclear, and reliable methods for assessing prediction confidence have not yet been established. In this study, we fine-tune two Llama models of different scales (1B and 8B) using low-rank adaptation (LoRA) on an inorganic crystal structure dataset. We systematically evaluate five input representations, namely chemical composition, crystal summary, local environment description, full text description, and crystallographic information files (CIF), for formation energy and bandgap prediction. Our results show that the optimal input representation depends on model scale. The 1B model performs better with compact representations, whereas the 8B model maintains high accuracy even with longer natural-language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
