Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?
Yuchu Liu, Rui Zhu, Jingwei Xiong, Haixu Tang

TL;DR
This paper introduces PolyLM, a language model trained on scientific literature that predicts polymer properties directly from unstructured text, outperforming traditional structure-based models.
Contribution
The work presents PolyLM, a novel language-based framework trained on a large literature dataset, enabling accurate prediction of polymer properties without structural inputs.
Findings
PolyLM achieves a median R^2 of 0.74 across 22 properties.
Predictions for key properties often exceed R^2 of 0.80.
The model outperforms existing structure-based prediction methods.
Abstract
Can large language models predict physical and mechanical polymer properties simply by reading unstructured scientific prose? Polymer performance is rarely determined by chemical structure alone; identical nominal polymers can exhibit drastically different behaviors depending on their synthesis route, processing history, morphology, and testing conditions. Yet, state-of-the-art polymer property models typically rely on structure-only representations -- such as SMILES or molecular graphs -- which strip away this vital experimental context. In this work, we introduce \textbf{PolyLM}, a natural-language-only, process- and condition-aware framework that predicts materials performance directly from full-text literature. By circumventing structural inputs entirely, PolyLM preserves the nuanced, unstructured descriptions of synthesis and processing reported by domain scientists. To train this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
