HumanEvalComm: Benchmarking the Communication Competence of Code   Generation for LLMs and LLM Agent

Jie JW Wu; Fatemeh H Fard

arXiv:2406.00215·cs.SE·January 29, 2025·1 cites

HumanEvalComm: Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent

Jie JW Wu, Fatemeh H Fard

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces HumanEvalComm, a benchmark to evaluate the communication skills of code-generating LLMs, emphasizing their ability to ask clarifying questions to improve code quality.

Contribution

It proposes a new benchmark and evaluation metrics for assessing LLMs' communication skills in code generation, and introduces the Okanagan LLM agent approach.

Findings

01

LLMs often struggle with ambiguous or incomplete problem descriptions.

02

The Okanagan agent effectively identifies and asks clarifying questions.

03

Communication skills improve code accuracy and relevance.

Abstract

Large language models (LLMs) have significantly improved their ability to perform tasks in the field of code generation. However, there is still a gap between LLMs being capable coders and being top-tier software engineers. Based on the observation that top-level software engineers often ask clarifying questions to reduce ambiguity in both requirements and coding solutions, we argue that the same should be applied to LLMs for code generation tasks. In this work, we conducted an empirical study on the benchmark and analysis of the communication skills of LLMs for code generation. We define communication skills of LLMs as ``being able to ask clarifying questions when the description of the code generation problem has issues''. We created a new benchmark, HumanEvalComm, by modifying problem descriptions according to three issues: inconsistency, ambiguity, incompleteness. We defined new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jie-jw-wu/human-eval-comm
noneOfficial

Datasets

jie-jw-wu/HumanEvalComm
dataset· 40 dl
40 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Artificial Intelligence in Law