Can LLMs Replace Manual Annotation of Software Engineering Artifacts?
Toufique Ahmed, Premkumar Devanbu, Christoph Treude, Michael Pradel

TL;DR
This study investigates replacing human annotators with large language models (LLMs) in software engineering evaluations, finding that LLMs can match human agreement levels and proposing methods to identify suitable tasks and samples for LLM use.
Contribution
It introduces a novel approach to substitute human annotation with LLMs in software engineering studies, including methods to predict task suitability and sample selection.
Findings
LLMs can achieve agreement levels close to human annotators.
Model-model agreement predicts task suitability for LLMs.
Model confidence helps select samples where LLMs can replace humans.
Abstract
Experimental evaluations of software engineering innovations, e.g., tools and processes, often include human-subject studies as a component of a multi-pronged strategy to obtain greater generalizability of the findings. However, human-subject studies in our field are challenging, due to the cost and difficulty of finding and employing suitable subjects, ideally, professional programmers with varying degrees of experience. Meanwhile, large language models (LLMs) have recently started to demonstrate human-level performance in several areas. This paper explores the possibility of substituting costly human subjects with much cheaper LLM queries in evaluations of code and code-related artifacts. We study this idea by applying six state-of-the-art LLMs to ten annotation tasks from five datasets created by prior work, such as judging the accuracy of a natural language summary of a method or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Model-Driven Software Engineering Techniques · Semantic Web and Ontologies
