Mining Software Repositories with a Collaborative Heuristic Repository
Hlib Babii, Julian Aron Prenner, Laurin Stricker, Anjan Karmakar,, Andrea Janes, Romain Robbes

TL;DR
This paper introduces a collaborative heuristic repository for software engineering artifacts, enhancing categorization accuracy by leveraging diverse heuristics and weak supervision techniques, demonstrated through commit classification.
Contribution
It presents a novel collaborative repository for heuristics that improves artifact categorization in software engineering using weak supervision.
Findings
Improved commit classification accuracy.
Diverse heuristics enhance weak supervision effectiveness.
Repository facilitates community-driven heuristic sharing.
Abstract
Many software engineering studies or tasks rely on categorizing software engineering artifacts. In practice, this is done either by defining simple but often imprecise heuristics, or by manual labelling of the artifacts. Unfortunately, errors in these categorizations impact the tasks that rely on them. To improve the precision of these categorizations, we propose to gather heuristics in a collaborative heuristic repository, to which researchers can contribute a large amount of diverse heuristics for a variety of tasks on a variety of SE artifacts. These heuristics are then leveraged by state-of-the-art weak supervision techniques to train high-quality classifiers, thus improving the categorizations. We present an initial version of the heuristic repository, which we applied to the concrete task of commit classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
