Investigating the Use of LLMs for Evidence Briefings Generation in Software Engineering
Mauro Marcelino, Marcos Alves, Bianca Trinkenreich, Bruno Cartaxo, S\'ergio Soares, Simone D.J. Barbosa, Marcos Kalinowski

TL;DR
This paper proposes an experimental protocol to evaluate the effectiveness of LLM-generated evidence briefings in software engineering, comparing them to human-created briefings in terms of content fidelity, understandability, and usefulness.
Contribution
It introduces a RAG-based LLM tool for automatic evidence briefing generation and a controlled experiment framework for assessing its quality against human-produced briefings.
Findings
Evaluation results pending after experiments.
Potential for reducing manual effort in evidence briefing creation.
Insights into LLM capabilities for technical summarization.
Abstract
[Context] An evidence briefing is a concise and objective transfer medium that can present the main findings of a study to software engineers in the industry. Although practitioners and researchers have deemed Evidence Briefings useful, their production requires manual labor, which may be a significant challenge to their broad adoption. [Goal] The goal of this registered report is to describe an experimental protocol for evaluating LLM-generated evidence briefings for secondary studies in terms of content fidelity, ease of understanding, and usefulness, as perceived by researchers and practitioners, compared to human-made briefings. [Method] We developed an RAG-based LLM tool to generate evidence briefings. We used the tool to automatically generate two evidence briefings that had been manually generated in previous research efforts. We designed a controlled experiment to evaluate how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
