CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates
Shresth Grover, Priyank Pathak, Akash Kumar, Vibhav Vineet, Yogesh S Rawat

TL;DR
This paper introduces CoSPlan, a benchmark for evaluating vision-language models in error-prone sequential planning tasks, and proposes Scene Graph Incremental updates (SGI), a training-free method that improves reasoning performance by adding intermediate steps.
Contribution
The paper presents a new benchmark for vision-language models in sequential planning with errors and introduces SGI, a novel, training-free approach that enhances reasoning accuracy.
Findings
VLMs struggle with error detection and correction in CoSPlan.
SGI improves VLM performance by an average of 5.2%.
SGI generalizes to traditional planning and VQA tasks.
Abstract
Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions towards a goal. Additionally, practical sequential planning often involves non-optimal (erroneous) steps, challenging VLMs to detect and correct such steps. We propose Corrective Sequential Planning Benchmark (CoSPlan) to evaluate VLMs in error-prone, vision-based sequential planning tasks across 4 domains: maze navigation, block rearrangement, image reconstruction,and object reorganization. CoSPlan assesses two key abilities: Error Detection (identifying non-optimal action) and Step Completion (correcting and completing action sequences to reach the goal). Despite using state-of-the-art reasoning techniques such as Chain-of-Thought and Scene Graphs, VLMs (e.g. Intern-VLM and Qwen2) struggle on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Reinforcement Learning in Robotics
