CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models

Khoa Vo; Sieu Tran; Taisei Hanyu; Yuki Ikebe; Duy Nguyen; Bui Duy Quoc Nghi; Minh Vu; Anthony Gunderman; Chase Rainwater; Anh Nguyen; Ngan Le

arXiv:2604.22238·cs.RO·April 27, 2026

CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models

Khoa Vo, Sieu Tran, Taisei Hanyu, Yuki Ikebe, Duy Nguyen, Bui Duy Quoc Nghi, Minh Vu, Anthony Gunderman, Chase Rainwater, Anh Nguyen, Ngan Le

PDF

TL;DR

CodeGraphVLP introduces a hierarchical approach combining semantic graphs, code-based planning, and visual prompts to improve long-horizon robot manipulation in complex, partially observable environments.

Contribution

It presents a novel framework that maintains task-relevant information and guides visual reasoning, significantly enhancing performance over existing VLA models in non-Markovian tasks.

Findings

01

Improves task completion rates on real-world non-Markovian tasks.

02

Reduces planning latency compared to VLM-in-the-loop methods.

03

Demonstrates the effectiveness of semantic graphs and progress-guided prompting.

Abstract

Vision-Language-Action (VLA) models promise generalist robot manipulation, but are typically trained and deployed as short-horizon policies that assume the latest observation is sufficient for action reasoning. This assumption breaks in non-Markovian long-horizon tasks, where task-relevant evidence can be occluded or appear only earlier in the trajectory, and where clutter and distractors make fine-grained visual grounding brittle. We present CodeGraphVLP, a hierarchical framework that enables reliable long-horizon manipulation by combining a persistent semantic-graph state with an executable code-based planner and progress-guided visual-language prompting. The semantic-graph maintains task-relevant entities and relations under partial observability. The synthesized planner executes over this semantic-graph to perform efficient progress checks and outputs a subtask instruction together…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.