CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates

Shresth Grover; Priyank Pathak; Akash Kumar; Vibhav Vineet; Yogesh S Rawat

arXiv:2512.10342·cs.CV·December 30, 2025

CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates

Shresth Grover, Priyank Pathak, Akash Kumar, Vibhav Vineet, Yogesh S Rawat

PDF

Open Access

TL;DR

This paper introduces CoSPlan, a benchmark for evaluating vision-language models in error-prone sequential planning tasks, and proposes Scene Graph Incremental updates (SGI), a training-free method that improves reasoning performance by adding intermediate steps.

Contribution

The paper presents a new benchmark for vision-language models in sequential planning with errors and introduces SGI, a novel, training-free approach that enhances reasoning accuracy.

Findings

01

VLMs struggle with error detection and correction in CoSPlan.

02

SGI improves VLM performance by an average of 5.2%.

03

SGI generalizes to traditional planning and VQA tasks.

Abstract

Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions towards a goal. Additionally, practical sequential planning often involves non-optimal (erroneous) steps, challenging VLMs to detect and correct such steps. We propose Corrective Sequential Planning Benchmark (CoSPlan) to evaluate VLMs in error-prone, vision-based sequential planning tasks across 4 domains: maze navigation, block rearrangement, image reconstruction,and object reorganization. CoSPlan assesses two key abilities: Error Detection (identifying non-optimal action) and Step Completion (correcting and completing action sequences to reach the goal). Despite using state-of-the-art reasoning techniques such as Chain-of-Thought and Scene Graphs, VLMs (e.g. Intern-VLM and Qwen2) struggle on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Reinforcement Learning in Robotics