The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-Based Code Generation

Yingjie Fu; Bozhou Li; Linyi Li; Wentao Zhang; Tao Xie

arXiv:2411.06774·cs.SE·May 13, 2025

The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-Based Code Generation

Yingjie Fu, Bozhou Li, Linyi Li, Wentao Zhang, Tao Xie

PDF

Open Access

TL;DR

This study evaluates how large language models perform in code generation based on iterative input-output examples, revealing significant challenges and opportunities for improving example-based prompting strategies.

Contribution

It is the first comprehensive analysis of LLMs in example-based code generation, introducing an iterative evaluation framework and a new benchmark of 172 functionalities.

Findings

01

LLMs' performance drops over 60% with example-based prompts compared to natural language.

02

Most functionalities are correctly implemented in the first iteration.

03

Combining I/O examples with natural language improves LLM accuracy.

Abstract

The capabilities of Large Language Models (LLMs) in code generation have been extensively studied, particularly for implementing target functionalities from natural-language descriptions. Alternatively, input-output (I/O) examples provide an accessible, unambiguous, and flexible way to describe functionalities. However, their inherent diversity, opaqueness, and incompleteness impose greater challenges for understanding and implementing the target requirements. Therefore, generating code from I/O examples (i.e., example-based code generation) provides a new perspective, allowing us to additionally evaluate LLMs' capability to infer target functionalities from limited information and to process new-form requirements. However, related research about LLMs in example-based code generation remains largely unexplored. To fill this gap, this paper presents the first comprehensive study on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Software Engineering Research · Natural Language Processing Techniques