WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation
Mingde Xu, Zhen Yang, Wenyi Hong, Lihang Pan, Xinyue Fan, Yan Wang, Xiaotao Gu, Bin Xu, Jie Tang

TL;DR
WebVIA introduces an interactive framework that automates UI-to-Code generation and validation, enabling the creation of executable, interactive web interfaces from design mockups with improved stability and accuracy.
Contribution
It is the first agentic framework for interactive UI-to-Code generation and validation, combining multi-state UI exploration, code generation, and interactivity verification.
Findings
WebVIA-Agent outperforms general-purpose agents in UI exploration stability.
Fine-tuned WebVIA-UI2Code models generate more executable and interactive code.
Significant improvements over baseline models on UI2Code benchmarks.
Abstract
User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpreadsheets and End-User Computing · Software Engineering Research · Interactive and Immersive Displays
