VisRefiner: Learning from Visual Differences for Screenshot-to-Code Generation

Jie Deng; Kaichun Yao; Libo Zhang

arXiv:2602.05998·cs.CV·February 6, 2026

VisRefiner: Learning from Visual Differences for Screenshot-to-Code Generation

Jie Deng, Kaichun Yao, Libo Zhang

PDF

Open Access

TL;DR

VisRefiner is a novel training framework that enhances screenshot-to-code generation by learning from visual differences between rendered outputs and reference designs, leading to improved accuracy and self-refinement capabilities.

Contribution

The paper introduces a difference-aligned supervision method and a reinforcement learning stage enabling models to learn from visual discrepancies, significantly improving layout fidelity and self-refinement in screenshot-to-code tasks.

Findings

01

Substantial improvement in single-step generation quality.

02

Enhanced layout fidelity in generated code.

03

Models demonstrate strong self-refinement abilities.

Abstract

Screenshot-to-code generation aims to translate user interface screenshots into executable frontend code that faithfully reproduces the target layout and style. Existing multimodal large language models perform this mapping directly from screenshots but are trained without observing the visual outcomes of their generated code. In contrast, human developers iteratively render their implementation, compare it with the design, and learn how visual differences relate to code changes. Inspired by this process, we propose VisRefiner, a training framework that enables models to learn from visual differences between rendered predictions and reference designs. We construct difference-aligned supervision that associates visual discrepancies with corresponding code edits, allowing the model to understand how appearance variations arise from implementation changes. Building on this, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeaching and Learning Programming · Spreadsheets and End-User Computing · Software Engineering Research