FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation

Hongda Zhu; Yiwen Zhang; Bing Zhao; Jingzhe Ding; Siyao Liu; Tong Liu; Dandan Wang; Yanan Liu; Zhaojian Li

arXiv:2506.13832·cs.SE·June 19, 2025

FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation

Hongda Zhu, Yiwen Zhang, Bing Zhao, Jingzhe Ding, Siyao Liu, Tong Liu, Dandan Wang, Yanan Liu, Zhaojian Li

PDF

Open Access

TL;DR

FrontendBench is a comprehensive, interactive benchmark with automatic evaluation for assessing large language models' ability to generate realistic front-end code, addressing limitations of previous simplistic and non-rigorous tests.

Contribution

We introduce FrontendBench, a new benchmark with interactive test scenarios and an automatic evaluation framework for more accurate assessment of LLMs in front-end development.

Findings

01

High agreement (90.54%) between automatic and human evaluations.

02

Significant performance disparities among state-of-the-art LLMs.

03

Benchmark covers diverse web components and realistic development challenges.

Abstract

Large Language Models (LLMs) have made significant strides in front-end code generation. However, existing benchmarks exhibit several critical limitations: many tasks are overly simplistic, test cases often lack rigor, and end-to-end validation is absent. These issues hinder the accurate assessment of model performance. To address these challenges, we present FrontendBench, a benchmark co-developed by humans and LLMs. FrontendBench categorizes tasks based on code functionality and incorporates interactive test scenarios, enabling a more comprehensive and practical evaluation of front-end code generation capabilities. The benchmark comprises 148 meticulously crafted prompt-test case pairs spanning five levels of web components, from basic UI elements to complex interactive features. Each task reflects realistic front-end development challenges. Furthermore, we introduce an automatic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management