A Framework for Using LLMs for Repository Mining Studies in Empirical Software Engineering
Vincenzo de Martino, Joel Casta\~no, Fabio Palomba, Xavier Franch,, Silverio Mart\'inez-Fern\'andez

TL;DR
This paper introduces PRIMES, a framework for improving the use of Large Language Models in software repository mining, focusing on prompt refinement, reproducibility, and reducing errors in empirical software engineering studies.
Contribution
It presents a novel, standardized framework for prompt engineering in LLM-based repository mining, enhancing reliability and reproducibility of empirical studies.
Findings
PRIMES checklist improves LLM output quality
Standardized prompt refinement enhances reproducibility
Iterative prompt comparison reduces errors
Abstract
Context: The emergence of Large Language Models (LLMs) has significantly transformed Software Engineering (SE) by providing innovative methods for analyzing software repositories. Objectives: Our objective is to establish a practical framework for future SE researchers needing to enhance the data collection and dataset while conducting software repository mining studies using LLMs. Method: This experience report shares insights from two previous repository mining studies, focusing on the methodologies used for creating, refining, and validating prompts that enhance the output of LLMs, particularly in the context of data collection in empirical studies. Results: Our research packages a framework, coined Prompt Refinement and Insights for Mining Empirical Software repositories (PRIMES), consisting of a checklist that can improve LLM usage performance, enhance output quality, and minimize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Software Engineering Research · Semantic Web and Ontologies
