Living Synthetic Benchmarks: A Neutral and Cumulative Framework for Simulation Studies
Franti\v{s}ek Barto\v{s}, Samuel Pawel, Bj\"orn S. Siepe

TL;DR
This paper proposes living synthetic benchmarks to improve the neutrality, comparability, and reproducibility of simulation studies in statistical method evaluation by continuously updating and disentangling methods and data-generating mechanisms.
Contribution
It introduces a framework for maintaining dynamic, neutral benchmarks that separate method development from simulation data, facilitating systematic comparisons and progress.
Findings
Prototype benchmark demonstrates feasibility for publication bias methods.
Living benchmarks enable continuous updates and systematic comparisons.
Framework promotes neutrality and reproducibility in simulation studies.
Abstract
Simulation studies are widely used to evaluate statistical methods. However, new methods are often introduced and evaluated using data-generating mechanisms (DGMs) devised by the same authors. This coupling creates misaligned incentives, e.g., the need to demonstrate the superiority of new methods, potentially compromising the neutrality of simulation studies. Furthermore, results of simulation studies are often difficult to compare due to differences in DGMs, competing methods, and performance measures. This fragmentation can lead to conflicting conclusions, hinder methodological progress, and delay the adoption of effective methods. To address these challenges, we introduce the concept of living synthetic benchmarks. The key idea is to disentangle method and simulation study development and continuously update the benchmark whenever a new DGM, method, or performance measure becomes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Advanced Causal Inference Techniques · Data Analysis with R
