Extrapolative ML Models for Copolymers
Israrul H. Hashmi, Himanshu, Rahul Karmakar, Tarak K Patra

TL;DR
This paper investigates how different machine learning models, especially neural networks and XGBoost, perform in predicting copolymer properties outside their training data range, emphasizing the importance of training data volume and diversity.
Contribution
It systematically analyzes the extrapolation capabilities of various ML models for copolymer property prediction, highlighting the impact of training data size and range on model performance.
Findings
Neural networks and XGBoost show strong extrapolation when trained on diverse data.
Tree search algorithms are inefficient for extrapolative tasks.
Training data volume and range critically influence ML model extrapolation ability.
Abstract
Machine learning models have been progressively used for predicting materials properties. These models can be built using pre-existing data and are useful for rapidly screening the physicochemical space of a material, which is astronomically large. However, ML models are inherently interpolative, and their efficacy for searching candidates outside a material's known range of property is unresolved. Moreover, the performance of an ML model is intricately connected to its learning strategy and the volume of training data. Here, we determine the relationship between the extrapolation ability of an ML model, the size and range of its training dataset, and its learning approach. We focus on a canonical problem of predicting the properties of a copolymer as a function of the sequence of its monomers. Tree search algorithms, which learn the similarity between polymer structures, are found to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuel Cells and Related Materials
MethodsFocus
