Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal, Xiang Yue, Erion Plaku, Ziyu Yao

TL;DR
This paper introduces PathEval, a benchmark to assess vision-language models as path evaluators in complex planning, revealing their current limitations and the need for task-specific vision adaptation.
Contribution
The work presents PathEval, a new benchmark for evaluating VLMs in path planning, and analyzes their performance, highlighting the importance of vision encoder adaptation.
Findings
VLMs struggle with low-level perception in path evaluation
Abstracting traits of optimal paths is feasible for VLMs
End-to-end fine-tuning is insufficient for improving VLMs in this task
Abstract
Despite their promise to perform complex reasoning, large language models (LLMs) have been shown to have limited effectiveness in end-to-end planning. This has inspired an intriguing question: if these models cannot plan well, can they still contribute to the planning framework as a helpful plan evaluator? In this work, we generalize this question to consider LLMs augmented with visual understanding, i.e., Vision-Language Models (VLMs). We introduce PathEval, a novel benchmark evaluating VLMs as plan evaluators in complex path-planning scenarios. Succeeding in the benchmark requires a VLM to be able to abstract traits of optimal paths from the scenario description, demonstrate precise low-level perception on each path, and integrate this information to decide the better path. Our analysis of state-of-the-art VLMs reveals that these models face significant challenges on the benchmark. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReligion and Sociopolitical Dynamics in Nigeria · Religious Tourism and Spaces · Geographic Information Systems Studies
