Evaluation-as-a-Service: Overview and Outlook
Allan Hanbury, Henning M\"uller, Krisztian Balog, Torben Brodt, Gordon, V. Cormack, Ivan Eggel, Tim Gollub, Frank Hopfgartner, Jayashree, Kalpathy-Cramer, Noriko Kando, Anastasia Krithara, Jimmy Lin, Simon Mercer,, Martin Potthast

TL;DR
This paper reviews Evaluation-as-a-Service (EaaS), a paradigm shift in empirical evaluation that keeps data centralized and accessible via APIs or VMs, addressing challenges like large, confidential, and dynamic datasets.
Contribution
It provides an overview of existing EaaS approaches, analyzes their usage scenarios, and discusses future directions for sustainable research infrastructures.
Findings
EaaS enables handling of large, confidential, and dynamic datasets.
Different EaaS approaches have various advantages and disadvantages.
Stakeholders include funding agencies, challenge organizers, researchers, and industry.
Abstract
Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely large data sets, confidential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Also, crowdsourcing has changed the way that industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the field of machine learning. This white…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Scientific Computing and Data Management · Software Engineering Research
