TL;DR
UI2Code^N introduces an interactive, feedback-driven approach to translating UI screenshots into code, leveraging reinforcement learning and visual optimization for improved accuracy and efficiency.
Contribution
The paper presents a novel interactive visual optimization framework for UI-to-code generation, including a new reinforcement learning method and an open-source 9B model.
Findings
State-of-the-art results on UI drafting, polishing, and editing benchmarks.
Performance improves with iterative visual optimization.
Outperforms larger models in UI-to-code tasks.
Abstract
UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI development that is inherently iterative and feedback-driven. We reformulate UI-to-code as an interactive visual optimization problem, where code generation is embedded in a closed-loop process of execution, visual inspection, and iterative refinement driven by rendered visual feedback. To address the non-differentiability of visual objectives and the noise of absolute visual evaluators, we propose Relative Visual Policy Optimization (RVPO), a preference-based reinforcement learning method that optimizes relative visual rankings among rendered candidates under execution feedback. We instantiate this paradigm in UI2Code^N, an open-source 9B model trained via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
