User Centric Evaluation of Code Generation Tools

Tanha Miah; Hong Zhu

arXiv:2402.03130·cs.SE·June 19, 2024·2 cites

User Centric Evaluation of Code Generation Tools

Tanha Miah, Hong Zhu

PDF

Open Access

TL;DR

This paper introduces a user-centric evaluation method for code generation tools like LLMs, focusing on usability and user experience, demonstrated through a case study on ChatGPT for R programming.

Contribution

It proposes a novel usability-focused evaluation framework incorporating metadata, multi-attempt testing, and user experience metrics, filling a gap in LLM assessment beyond capability comparison.

Findings

01

ChatGPT is highly useful for R code generation.

02

Average user attempts per task are 1.61.

03

Usability is weakest in conciseness, scoring 3.80/5.

Abstract

With the rapid advance of machine learning (ML) technology, large language models (LLMs) are increasingly explored as an intelligent tool to generate program code from natural language specifications. However, existing evaluations of LLMs have focused on their capabilities in comparison with humans. It is desirable to evaluate their usability when deciding on whether to use a LLM in software production. This paper proposes a user centric method for this purpose. It includes metadata in the test cases of a benchmark to describe their usages, conducts testing in a multi-attempt process that mimics the uses of LLMs, measures LLM generated solutions on a set of quality attributes that reflect usability, and evaluates the performance based on user experiences in the uses of LLMs as a tool. The paper also reports a case study with the method in the evaluation of ChatGPT's usability as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)

MethodsSparse Evolutionary Training