The Case for Multi-Version Experimental Evaluation (MVEE)
Simon J\"orz, Felix Schuhknecht

TL;DR
This paper introduces MVEE, a method to improve experimental evaluation in databases by analyzing multiple compiled code versions to account for build anomalies, enhancing evaluation accuracy.
Contribution
It proposes MVEE, an approach that automatically detects build anomalies at the assembly level and incorporates multiple method versions into evaluations.
Findings
MVEE detects build anomalies across different builds.
Including multiple versions improves evaluation reliability.
MVEE increases the expressiveness of experimental comparisons.
Abstract
In the database community, we typically evaluate new methods based on experimental results, which we produce by integrating the proposed method along with a set of baselines in a single benchmarking codebase and measuring the individual runtimes. If we are unhappy with the performance of our method, we gradually improve it while repeatedly comparing to the baselines, until we outperform them. While this seems like a reasonable approach, it makes one delicate assumption: We assume that across the optimization workflow, there exists only a single compiled version of each baseline to compare to. However, we learned the hard way that in practice, even though the source code remains untouched, general purpose compilers might still generate highly different compiled code across builds, caused by seemingly unrelated changes in other parts of the codebase, leading to flawed comparisons and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
