ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models
Matteo Merler, Nicola Dainese, Minttu Alakuijala, Giovanni Bonetta, Pietro Ferrazzi, Yu Tian, Bernardo Magnini, Pekka Marttinen

TL;DR
ViPlan introduces a new benchmark to compare symbolic and direct vision-language model planning methods across visual domains, revealing strengths and limitations of each approach in different scenarios.
Contribution
This work provides the first open-source benchmark for evaluating VLM-grounded symbolic planning versus direct VLM planning in visual tasks.
Findings
VLM-as-grounder outperforms in Blocksworld (46% vs 9%)
VLM-as-planner outperforms in household robotics (34% vs 5%)
Chain-of-Thought prompting shows no consistent benefit
Abstract
Integrating Large Language Models with symbolic planners is a promising direction for obtaining verifiable and grounded plans, with recent work extending this idea to visual domains using Vision-Language Models (VLMs). However, a rigorous comparison with methods that plan directly with VLMs is missing, due to a lack of visual benchmarks that support symbolic planning. We present ViPlan, the first open-source benchmark for comparing VLM-grounded symbolic approaches (VLM-as-grounder) with direct VLM planning methods (VLM-as-planner). ViPlan introduces a series of increasingly challenging tasks in two visual domains: a visual variant of the classic Blocksworld planning problem and a simulated household robotics environment. We find VLM-as-grounder methods to outperform direct VLM planning in Blocksworld (solving 46% of the tasks against 9%), where image grounding is both crucial and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Robotic Path Planning Algorithms
