Can LLMs Replace Manual Annotation of Software Engineering Artifacts?

Toufique Ahmed; Premkumar Devanbu; Christoph Treude; Michael Pradel

arXiv:2408.05534·cs.SE·February 6, 2025·2 cites

Can LLMs Replace Manual Annotation of Software Engineering Artifacts?

Toufique Ahmed, Premkumar Devanbu, Christoph Treude, Michael Pradel

PDF

Open Access

TL;DR

This study investigates replacing human annotators with large language models (LLMs) in software engineering evaluations, finding that LLMs can match human agreement levels and proposing methods to identify suitable tasks and samples for LLM use.

Contribution

It introduces a novel approach to substitute human annotation with LLMs in software engineering studies, including methods to predict task suitability and sample selection.

Findings

01

LLMs can achieve agreement levels close to human annotators.

02

Model-model agreement predicts task suitability for LLMs.

03

Model confidence helps select samples where LLMs can replace humans.

Abstract

Experimental evaluations of software engineering innovations, e.g., tools and processes, often include human-subject studies as a component of a multi-pronged strategy to obtain greater generalizability of the findings. However, human-subject studies in our field are challenging, due to the cost and difficulty of finding and employing suitable subjects, ideally, professional programmers with varying degrees of experience. Meanwhile, large language models (LLMs) have recently started to demonstrate human-level performance in several areas. This paper explores the possibility of substituting costly human subjects with much cheaper LLM queries in evaluations of code and code-related artifacts. We study this idea by applying six state-of-the-art LLMs to ten annotation tasks from five datasets created by prior work, such as judging the accuracy of a natural language summary of a method or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis · Model-Driven Software Engineering Techniques · Semantic Web and Ontologies