AsgardBench -- Evaluating Visually Grounded Interactive Planning Under Minimal Feedback

Andrea Tupini; Lars Liden; Reuben Tan; Yu Wang; Jianfeng Gao

arXiv:2603.15888·cs.AI·March 20, 2026

AsgardBench -- Evaluating Visually Grounded Interactive Planning Under Minimal Feedback

Andrea Tupini, Lars Liden, Reuben Tan, Yu Wang, Jianfeng Gao

PDF

Open Access

TL;DR

AsgardBench is a benchmark designed to evaluate the ability of vision-language models to adapt high-level plans during execution based solely on visual observations, emphasizing interactive planning and plan repair.

Contribution

It introduces a controlled benchmark for assessing visual grounding and plan adaptation in embodied AI, focusing on interactive planning without low-level control noise.

Findings

01

Performance drops significantly without visual input

02

Models struggle with visual grounding and state tracking

03

Benchmark reveals weaknesses in plan adaptation capabilities

Abstract

With AsgardBench we aim to evaluate visually grounded, high-level action sequence generation and interactive planning, focusing specifically on plan adaptation during execution based on visual observations rather than navigation or low-level manipulation. In the landscape of embodied AI benchmarks, AsgardBench targets the capability category of interactive planning, which is more sophisticated than offline high-level planning as it requires agents to revise plans in response to environmental feedback, yet remains distinct from low-level execution. Unlike prior embodied AI benchmarks that conflate reasoning with navigation or provide rich corrective feedback that substitutes for perception, AsgardBench restricts agent input to images, action history, and lightweight success/failure signals, isolating interactive planning in a controlled simulator without low-level control noise. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms