A Comparison of Conversational Models and Humans in Answering Technical Questions: the Firefox Case
Joao Correia, Daniel Coutinho, Marco Castelluccio, Caio Barbosa, Rafael de Mello, Anita Sarma, Alessandro Garcia, Marco Gerosa, and Igor Steinmacher

TL;DR
This study compares human developers, GPT, and RAG-enhanced GPT in answering Mozilla Firefox developer questions, showing RAG improves comprehensiveness and helpfulness but needs to reduce verbosity for better support.
Contribution
It provides an empirical evaluation of RAG-enhanced language models in real-world software development support within an open source project.
Findings
RAG responses are more comprehensive than human answers.
RAG responses are nearly as helpful as human responses.
RAG responses tend to be verbose and could be shortened.
Abstract
The use of Large Language Models (LLMs) to support tasks in software development has steadily increased over recent years. From assisting developers in coding activities to providing conversational agents that answer newcomers' questions. In collaboration with the Mozilla Foundation, this study evaluates the effectiveness of Retrieval-Augmented Generation (RAG) in assisting developers within the Mozilla Firefox project. We conducted an empirical analysis comparing responses from human developers, a standard GPT model, and a GPT model enhanced with RAG, using real queries from Mozilla's developer chat rooms. To ensure a rigorous evaluation, Mozilla experts assessed the responses based on helpfulness, comprehensiveness, and conciseness. The results show that RAG-assisted responses were more comprehensive than human developers (62.50% to 54.17%) and almost as helpful (75.00% to 79.17%),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
