1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World
Qiao Xu, Yipeng Yu, Chengxiao Feng, Xu Liu

TL;DR
1D-Bench is a real-world, multi-round benchmark for design-to-code UI generation that evaluates models' robustness and iterative editing capabilities using visual feedback and intermediate representations.
Contribution
We introduce 1D-Bench, a novel benchmark based on real e-commerce workflows for evaluating iterative UI code generation with visual feedback.
Findings
Iterative editing improves rendering success.
Models show increased robustness to intermediate representation errors.
Limited gains from reinforcement learning-based editing.
Abstract
Design-to-code translates high-fidelity UI designs into executable front-end implementations, but progress remains hard to compare due to inconsistent datasets, toolchains, and evaluation protocols. We introduce 1D-Bench, a benchmark grounded in real e-commerce workflows, where each instance provides a reference rendering and an exported intermediate representation that may contain extraction errors. 1D is short for one day, representing the efficient completion of design-to-code tasks in less than one day. Models take both as input, using the intermediate representation as structural cues while being evaluated against the reference rendering, which tests robustness to intermediate representation defects rather than literal adherence. 1D-Bench requires generating an executable React codebase under a fixed toolchain with an explicit component hierarchy, and defines a multi-round…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Model-Driven Software Engineering Techniques
