All Emulators are Wrong, Many are Useful, and Some are More Useful Than Others: A Reproducible Comparison of Computer Model Surrogates
Kellin N. Rumsey, Graham C. Gibson, Devin Francom, Reid Morris

TL;DR
This paper provides a comprehensive, reproducible comparison of 29 surrogate modeling methods across diverse datasets, introducing the uqlingmulation framework to facilitate standardized benchmarking and guide practitioners.
Contribution
It introduces the uqlingramework for reproducible emulator benchmarking and offers an empirical comparison of 29 methods across multiple datasets.
Findings
Empirical insights into strengths and weaknesses of state-of-the-art emulators.
Guidance for selecting surrogates based on dataset characteristics.
Framework enables reproducible and extendable emulator comparisons.
Abstract
Accurate and efficient surrogate modeling is essential for modern computational science, and there are a staggering number of emulation methods to choose from. With new methods being developed all the time, comparing the relative strengths and weaknesses of different methods remains a challenge due to inconsistent benchmarking practices and (sometimes) limited reproducibility and transparency. In this work, we present a large-scale, fully reproducible comparison of distinct emulators across canonical test functions and real emulation datasets. To facilitate rigorous, apples-to-apples comparisons, we introduce the R package \texttt{duqling}, which streamlines reproducible simulation studies using a consistent, simple syntax, and automatic internal scaling of inputs. This framework allows researchers to compare emulators in a unified environment and makes it possible to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
