Toward a Benchmark Repository for Software Maintenance Tool Evaluations   with Humans

Mat\'u\v{s} Sul\'ir

arXiv:2011.14751·cs.SE·December 1, 2020

Toward a Benchmark Repository for Software Maintenance Tool Evaluations with Humans

Mat\'u\v{s} Sul\'ir

PDF

TL;DR

This paper advocates for creating a standardized benchmark repository for evaluating software maintenance tools with human participants, aiming to improve comparability and fairness in experimental assessments.

Contribution

It proposes a framework and steps for developing a benchmark repository for human-based software maintenance tool evaluations, addressing current ad-hoc practices.

Findings

01

Identifies challenges in current evaluation methods

02

Suggests a structured repository to improve comparability

03

Outlines steps for repository development

Abstract

To evaluate software maintenance techniques and tools in controlled experiments with human participants, researchers currently use projects and tasks selected on an ad-hoc basis. This can unrealistically favor their tool, and it makes the comparison of results difficult. We suggest a gradual creation of a benchmark repository with projects, tasks, and metadata relevant for human-based studies. In this paper, we discuss the requirements and challenges of such a repository, along with the steps which could lead to its construction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.