Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms
Philippe Martin Wyder, Judah Goldfeder, Alexey Yermakov, Yue Zhao, Stefano Riva, Jan P. Williams, David Zoro, Amy Sara Rude, Matteo Tomasetto, Joe Germany, Joseph Bakarji, Georg Maierhofer, Miles Cranmer, J. Nathan Kutz

TL;DR
The paper introduces a Common Task Framework (CTF) for scientific machine learning to standardize evaluation, improve reproducibility, and facilitate fair comparison of algorithms across diverse scientific domains.
Contribution
It proposes a structured benchmark with datasets and metrics for evaluating scientific ML algorithms, inspired by successful CTFs in NLP and computer vision.
Findings
Benchmarking on Kuramoto-Sivashinsky and Lorenz systems demonstrates the framework's effectiveness.
The CTF reveals strengths and limitations of different algorithms.
A community challenge on sea surface temperature data is planned.
Abstract
Machine learning (ML) is transforming modeling and control in the physical, engineering, and biological sciences. However, rapid development has outpaced the creation of standardized, objective benchmarks - leading to weak baselines, reporting bias, and inconsistent evaluations across methods. This undermines reproducibility, misguides resource allocation, and obscures scientific progress. To address this, we propose a Common Task Framework (CTF) for scientific machine learning. The CTF features a curated set of datasets and task-specific metrics spanning forecasting, state reconstruction, and generalization under realistic constraints, including noise and limited data. Inspired by the success of CTFs in fields like natural language processing and computer vision, our framework provides a structured, rigorous foundation for head-to-head evaluation of diverse algorithms. As a first step,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
