AsgardBench -- Evaluating Visually Grounded Interactive Planning Under Minimal Feedback
Andrea Tupini, Lars Liden, Reuben Tan, Yu Wang, Jianfeng Gao

TL;DR
AsgardBench is a benchmark designed to evaluate the ability of vision-language models to adapt high-level plans during execution based solely on visual observations, emphasizing interactive planning and plan repair.
Contribution
It introduces a controlled benchmark for assessing visual grounding and plan adaptation in embodied AI, focusing on interactive planning without low-level control noise.
Findings
Performance drops significantly without visual input
Models struggle with visual grounding and state tracking
Benchmark reveals weaknesses in plan adaptation capabilities
Abstract
With AsgardBench we aim to evaluate visually grounded, high-level action sequence generation and interactive planning, focusing specifically on plan adaptation during execution based on visual observations rather than navigation or low-level manipulation. In the landscape of embodied AI benchmarks, AsgardBench targets the capability category of interactive planning, which is more sophisticated than offline high-level planning as it requires agents to revise plans in response to environmental feedback, yet remains distinct from low-level execution. Unlike prior embodied AI benchmarks that conflate reasoning with navigation or provide rich corrective feedback that substitutes for perception, AsgardBench restricts agent input to images, action history, and lightweight success/failure signals, isolating interactive planning in a controlled simulator without low-level control noise. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms
