When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

Amal AKLI; Mike PAPADAKIS; Maxime CORDY; Yves Le TRAON

arXiv:2604.24712·cs.SE·April 28, 2026

When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

Amal AKLI, Mike PAPADAKIS, Maxime CORDY, Yves Le TRAON

PDF

TL;DR

This study explores how prompt structure and richness affect LLM-based code generation robustness, revealing that richer prompts can mitigate under-specification issues and sometimes improve correctness.

Contribution

It demonstrates that prompt structure significantly influences LLM robustness, with richer prompts reducing sensitivity to under-specification and enabling correctness improvements.

Findings

01

Robustness varies with prompt structure and task complexity.

02

Structurally rich prompts mitigate under-specification effects.

03

Prompt mutations can disrupt misleading cues and improve correctness.

Abstract

Large language models are increasingly used for code generation, yet the correctness of their outputs depends not only on model capability but also on how tasks are specified. Prior studies demonstrate that small changes in natural language prompts, particularly under-specification can substantially reduce code correctness; however, these findings are largely based on minimal-specification benchmarks such as HumanEval and MBPP, where limited structural redundancy may exaggerate sensitivity. In this exploratory study, we investigate how prompt structure, task complexity, and specification richness interact with LLM robustness to prompt mutations. We evaluate 10 different models across HumanEval and the structurally richer LiveCodeBench. Our results reveal that robustness is not a fixed property of LLMs but is highly dependent on prompt structure: the same under-specification mutations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.