Scaling Law of Sim2Real Transfer Learning in Expanding Computational Materials Databases for Real-World Predictions
Shunya Minami, Yoshihiro Hayashi, Stephen Wu, Kenji Fukumizu, Hiroki, Sugisawa, Masashi Ishii, Isao Kuwajima, Kazuya Shiratori, Ryo Yoshida

TL;DR
This paper demonstrates that in materials science, prediction errors decrease following a power-law as computational database size increases, guiding data development for real-world applications.
Contribution
It establishes the scaling law of Sim2Real transfer learning, providing insights into database size effects on prediction accuracy in materials science.
Findings
Prediction error decreases with computational data size following a power-law.
Scaling behavior helps determine necessary sample sizes for target performance.
Insights guide design of data protocols for real-world materials predictions.
Abstract
To address the challenge of limited experimental materials data, extensive physical property databases are being developed based on high-throughput computational experiments, such as molecular dynamics simulations. Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compared to learning from scratch. This study demonstrates the scaling law of simulation-to-real (Sim2Real) transfer learning for several machine learning tasks in materials science. Case studies of three prediction tasks for polymers and inorganic materials reveal that the prediction error on real systems decreases according to a power-law as the size of the computational data increases. Observing the scaling behavior offers various insights for database development, such as determining the sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
