WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics

Chenxu Liu; Yingjie Fu; Wei Yang; Ying Zhang; Tao Xie

arXiv:2601.02430·cs.SE·March 17, 2026

WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics

Chenxu Liu, Yingjie Fu, Wei Yang, Ying Zhang, Tao Xie

PDF

Open Access

TL;DR

WebCoderBench is a comprehensive and interpretable benchmark for evaluating web application generation by LLMs, using real user requirements and diverse metrics to guide model improvement.

Contribution

It introduces the first real-world, generalizable, and interpretable benchmark with 1,572 user requirements and 24 evaluation metrics for web app generation.

Findings

01

No single model dominates across all metrics

02

WebCoderBench enables targeted model optimization

03

Experiments reveal diverse strengths and weaknesses of LLMs

Abstract

Web applications (web apps) have become a key arena for large language models (LLMs) to demonstrate their code generation capabilities and commercial potential. However, building a benchmark for LLM-generated web apps remains challenging due to the need for real-world user requirements, generalizable evaluation metrics without relying on ground-truth implementations or test cases, and interpretable evaluation results. To address these challenges, we introduce WebCoderBench, the first real-world-collected, generalizable, and interpretable benchmark for web app generation. WebCoderBench comprises 1,572 real user requirements, covering diverse modalities and expression styles that reflect realistic user intentions. WebCoderBench provides 24 fine-grained evaluation metrics across 9 perspectives, combining rule-based and LLM-as-a-judge paradigm for fully automated, objective, and general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Software Engineering Research · Software Engineering Techniques and Practices