NoCode-bench: A Benchmark for Evaluating Natural Language-Driven Feature Addition

Le Deng; Zhonghao Jiang; Jialun Cao; Michael Pradel; Zhongxin Liu

arXiv:2507.18130·cs.SE·August 19, 2025

NoCode-bench: A Benchmark for Evaluating Natural Language-Driven Feature Addition

Le Deng, Zhonghao Jiang, Jialun Cao, Michael Pradel, Zhongxin Liu

PDF

Open Access

TL;DR

NoCode-bench is a new benchmark for evaluating large language models on real-world natural language-driven feature addition tasks, revealing current limitations of LLMs in no-code development.

Contribution

Introduces NoCode-bench, a comprehensive benchmark with 634 tasks for assessing LLMs in natural language-driven software development, including a verified subset for reliable evaluation.

Findings

01

Best LLMs achieve only 28.07% success rate

02

Challenges include cross-file editing and codebase understanding

03

LLMs are not yet ready for fully NL-driven no-code development

Abstract

Natural language-driven no-code development allows users to specify software functionality using natural language (NL) instead of editing source code, promising increased productivity and democratized development. Large language models (LLMs) show potential in enabling this paradigm. In this context, software documentation acts as an NL specification for functionality. This work introduces NoCode-bench, a benchmark designed to evaluate LLMs on real-world NL-driven feature addition tasks, consisting of 634 tasks across 10 projects and 114k code changes. Each task pairs documentation updates with corresponding code implementations, validated by developer-written test cases. A subset of 114 high-quality, human-verified instances, NoCode-bench Verified, ensures reliable evaluation. Our experiments reveal that, despite high token usage, the best LLMs achieve a task success rate of only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification