Understanding Structural Representation in Foundation Models for Polymers
Nathaniel H. Park, Eduardo Soares, Victor Y. Shirasuna, Tiffany J. Callahan, Sara Capponi, Emilio Vital Brazil

TL;DR
This paper introduces a new SMILES-based polymer graph representation for foundation models, demonstrating superior performance and invariance in polymer property prediction across multiple benchmarks.
Contribution
The paper presents a novel polymer graph representation method using SMILES, improving foundation model performance and providing insights into representation invariance and model behavior.
Findings
The new representation outperforms other variations in benchmark tests.
SMILES-based models show high invariance, with many variations achieving near state-of-the-art results.
Models interpolate over sequence space, including invalid inputs, affecting prediction accuracy.
Abstract
From the relative scarcity of training data to the lack of standardized benchmarks, the development of foundation models for polymers face significant and multi-faceted challenges. At the core, many of these issues are tied directly to the structural representation of polymers and here, we present a new foundation model using a SMILES-based polymer graph representation. This approach allows representation of critical polymer architectural features and connectivity that are not available in other SMILES-based representations. The developed polymer foundation model exhibited excellent performance on 28 different benchmark datasets. Critical evaluation of the developed representation against other variations in control experiments reveals this approach to be a highly performant method of representing polymers in language-based foundation models. These control experiments also reveal a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Polymer Synthesis and Characterization
