Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results
Christina Nie{\ss}l (1), Moritz Herrmann (2), Chiara Wiedemann (1),, Giuseppe Casalicchio (2), Anne-Laure Boulesteix (1) ((1) Institute for, Medical Information Processing, Biometry, Epidemiology, LMU Munich,, Germany, (2) Department of Statistics, LMU Munich, Germany)

TL;DR
This paper highlights how flexible choices in benchmark study design and analysis can lead to biased, overly optimistic results, emphasizing the need for careful, transparent research practices.
Contribution
It demonstrates the impact of multiple design and analysis options on benchmark results and advocates for awareness and transparency to improve reliability.
Findings
Benchmark results vary significantly with different design choices.
Multidimensional unfolding can assess the impact of analysis options.
Questionable practices can bias interpretations of benchmark studies.
Abstract
In recent years, the need for neutral benchmark studies that focus on the comparison of methods from computational sciences has been increasingly recognised by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, certain amounts of flexibility always exist. This includes the choice of data sets and performance measures, the handling of missing performance values and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g. the selective reporting of results or the post-hoc modification of design or analysis components) to fit their expectations or hopes. To raise awareness for this issue, we use an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
