Toward a Benchmark Repository for Software Maintenance Tool Evaluations with Humans
Mat\'u\v{s} Sul\'ir

TL;DR
This paper advocates for creating a standardized benchmark repository for evaluating software maintenance tools with human participants, aiming to improve comparability and fairness in experimental assessments.
Contribution
It proposes a framework and steps for developing a benchmark repository for human-based software maintenance tool evaluations, addressing current ad-hoc practices.
Findings
Identifies challenges in current evaluation methods
Suggests a structured repository to improve comparability
Outlines steps for repository development
Abstract
To evaluate software maintenance techniques and tools in controlled experiments with human participants, researchers currently use projects and tasks selected on an ad-hoc basis. This can unrealistically favor their tool, and it makes the comparison of results difficult. We suggest a gradual creation of a benchmark repository with projects, tasks, and metadata relevant for human-based studies. In this paper, we discuss the requirements and challenges of such a repository, along with the steps which could lead to its construction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
