Omics-scale polymer computational database transferable to real-world artificial intelligence applications
Ryo Yoshida, Yoshihiro Hayashi, Hidemine Furuya, Ryohei Hosoya, Kazuyoshi Kaneko, Hiroki Sugisawa, Yu Kaneko, Aiko Takahashi, Yoh Noguchi, Shun Nanjo, Keiko Shinoda, Tomu Hamakawa, Mitsuru Ohno, Takuya Kitamura, Misaki Yonekawa, Stephen Wu, Masato Ohnishi, Chang Liu

TL;DR
This paper introduces PolyOmics, a large-scale computational polymer database generated via automated simulations, enabling improved AI models for polymer property prediction and bridging the gap between computational and experimental materials science.
Contribution
The creation of PolyOmics, an extensive, collaborative computational database for polymers, and demonstrating its effectiveness in enhancing machine learning models for real-world applications.
Findings
Database size improves model generalization power-law scaling
Pretrained models can be fine-tuned with limited experimental data
Ultralarge-scale data reveals unexplored polymer regions
Abstract
Developing large-scale foundational datasets is a critical milestone in advancing artificial intelligence (AI)-driven scientific innovation. However, unlike AI-mature fields such as natural language processing, materials science, particularly polymer research, has significantly lagged in developing extensive open datasets. This lag is primarily due to the high costs of polymer synthesis and property measurements, along with the vastness and complexity of the chemical space. This study presents PolyOmics, an omics-scale computational database generated through fully automated molecular dynamics simulation pipelines that provide diverse physical properties for over polymeric materials. The PolyOmics database is collaboratively developed by approximately 260 researchers from 48 institutions to bridge the gap between academia and industry. Machine learning models pretrained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Block Copolymer Self-Assembly · Catalysis and Oxidation Reactions
