VisRefiner: Learning from Visual Differences for Screenshot-to-Code Generation
Jie Deng, Kaichun Yao, Libo Zhang

TL;DR
VisRefiner is a novel training framework that enhances screenshot-to-code generation by learning from visual differences between rendered outputs and reference designs, leading to improved accuracy and self-refinement capabilities.
Contribution
The paper introduces a difference-aligned supervision method and a reinforcement learning stage enabling models to learn from visual discrepancies, significantly improving layout fidelity and self-refinement in screenshot-to-code tasks.
Findings
Substantial improvement in single-step generation quality.
Enhanced layout fidelity in generated code.
Models demonstrate strong self-refinement abilities.
Abstract
Screenshot-to-code generation aims to translate user interface screenshots into executable frontend code that faithfully reproduces the target layout and style. Existing multimodal large language models perform this mapping directly from screenshots but are trained without observing the visual outcomes of their generated code. In contrast, human developers iteratively render their implementation, compare it with the design, and learn how visual differences relate to code changes. Inspired by this process, we propose VisRefiner, a training framework that enables models to learn from visual differences between rendered predictions and reference designs. We construct difference-aligned supervision that associates visual discrepancies with corresponding code edits, allowing the model to understand how appearance variations arise from implementation changes. Building on this, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming · Spreadsheets and End-User Computing · Software Engineering Research
