Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields
Yatao Li, Wanling Gao, Lei Wang, Lixin Sun, Zun Wang, Jianfeng Zhan

TL;DR
This paper argues that AI for science requires specialized benchmarks that reflect real-world scientific challenges, demonstrated through a case study on machine learning force fields for molecular dynamics simulations.
Contribution
It introduces a novel benchmarking approach tailored for AI4S, focusing on sample efficiency, time sensitivity, and generalization, improving evaluation relevance for scientific applications.
Findings
Traditional benchmarks are inadequate for AI4S due to out-of-distribution challenges.
Proposed metrics better assess models' real-world scientific performance.
The benchmark suite enhances evaluation of ML models in scientific contexts.
Abstract
AI for science (AI4S) is an emerging research field that aims to enhance the accuracy and speed of scientific computing tasks using machine learning methods. Traditional AI benchmarking methods struggle to adapt to the unique challenges posed by AI4S because they assume data in training, testing, and future real-world queries are independent and identically distributed, while AI4S workloads anticipate out-of-distribution problem instances. This paper investigates the need for a novel approach to effectively benchmark AI for science, using the machine learning force field (MLFF) as a case study. MLFF is a method to accelerate molecular dynamics (MD) simulation with low computational cost and high accuracy. We identify various missed opportunities in scientifically meaningful benchmarking and propose solutions to evaluate MLFF models, specifically in the aspects of sample efficiency, time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
