Evaluating Repository-level Software Documentation via Question Answering and Feature-Driven Development

Xinchen Wang; Ruida Hu; Cuiyun Gao; Pengfei Gao; Chao Peng

arXiv:2604.06793·cs.SE·April 9, 2026

Evaluating Repository-level Software Documentation via Question Answering and Feature-Driven Development

Xinchen Wang, Ruida Hu, Cuiyun Gao, Pengfei Gao, Chao Peng

PDF

Abstract

Software documentation is crucial for repository comprehension. While Large Language Models (LLMs) advance documentation generation from code snippets to entire repositories, existing benchmarks have two key limitations: (1) they lack a holistic, repository-level assessment, and (2) they rely on unreliable evaluation strategies, such as LLM-as-a-judge, which suffers from vague criteria and limited repository-level knowledge. To address these issues, we introduce SWD-Bench, a novel benchmark for evaluating repository-level software documentation. Inspired by documentation-driven development, our strategy evaluates documentation quality by assessing an LLM's ability to understand and implement functionalities using the documentation, rather than by directly scoring it. This is measured through function-driven Question Answering (QA) tasks. SWD-Bench comprises three interconnected QA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.