MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Yan Li; Zezi Zeng; Yifan Yang; Yuqing Yang; Ning Liao; Weiwei Guo; Lili Qiu; Mingxi Cheng; Qi Dai; Zhendong Wang; Zhengyuan Yang; Xue Yang; Ji Li; Lijuan Wang; and Chong Luo

arXiv:2604.15309·cs.CV·April 17, 2026

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, and Chong Luo

PDF

2 Repos 1 Datasets

TL;DR

MM-WebAgent is a hierarchical framework that improves multimodal webpage generation by coordinating content and layout for coherence, using iterative planning and self-reflection, and is validated by a new benchmark.

Contribution

It introduces a novel hierarchical agentic framework for multimodal webpage generation and a benchmark for systematic evaluation.

Findings

01

Outperforms baselines in multimodal element generation and integration.

02

Produces more coherent and visually consistent webpages.

03

Demonstrates effectiveness through experiments on the proposed benchmark.

Abstract

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

microsoft/MM-WebGen-Bench
dataset· 53 dl
53 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.