Software Artifact Mining in Software Engineering Conferences: A Meta-Analysis
Zeinab Abou Khalil (DGD-I, Inria), Stefano Zacchiroli (LTCI, IP Paris)

TL;DR
This meta-analysis examines 16 years of empirical software engineering research, revealing trends in the types of software artifacts mined, with source code and test data being most common, and an increasing interest in novel artifacts.
Contribution
It provides a comprehensive quantitative overview of artifact mining practices in ESE research, highlighting evolving trends and research focuses.
Findings
Source code and test data are the most frequently mined artifacts.
Mining activities are present in the majority of papers analyzed.
Interest in mining new artifacts alongside source code is increasing.
Abstract
Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper.Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support.Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
