InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Qiyao Wang; Haoran Hu; Longze Chen; Hongbo Wang; Hamid Alinejad-Rokny; Yuan Lin; Min Yang

arXiv:2604.27419·cs.AI·May 1, 2026

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Qiyao Wang, Haoran Hu, Longze Chen, Hongbo Wang, Hamid Alinejad-Rokny, Yuan Lin, Min Yang

PDF

1 Repo

TL;DR

This paper introduces InteractWeb-Bench, a benchmark for evaluating multimodal agents in website generation under realistic, ambiguous user conditions, highlighting current limitations in intent understanding and interaction.

Contribution

It presents the first interactive benchmark with diverse user simulations and an environment for iterative refinement, addressing the gap in real-world, low-code website development scenarios.

Findings

01

MLLM-based agents often fail due to blind execution and poor intent recognition.

02

The benchmark simulates diverse user behaviors, including ambiguity and contradiction.

03

Current agents show limitations in adaptive interaction and requirement understanding.

Abstract

With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings. In contrast, real-world development is constrained by a critical bottleneck: the semantic misalignment between ambiguous, low-quality instructions from non-expert users and model understanding, which results in a failure mode that we term blind execution. To address this gap, we introduce InteractWeb-Bench, the first multimodal interactive benchmark for website generation under non-expert low-code user conditions. InteractWeb-Bench introduces four types of user agents and persona-driven instruction perturbations to systematically simulate diverse user behaviors,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aiforip/InteractWeb-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.