Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding

Lena Wild; Katie Z Luo; Marco Pavone

arXiv:2605.20942·cs.CV·May 21, 2026

Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding

Lena Wild, Katie Z Luo, Marco Pavone

PDF

TL;DR

This paper introduces CRS, a graph-based framework that combines geometric road structure with open-vocabulary semantics, improving structured reasoning in autonomous driving systems.

Contribution

CRS bridges the gap between semantic flexibility and structural precision by enabling joint execution of geometric and linguistic reasoning in a unified graph-grounded framework.

Findings

01

CRS enables automatic generation of complex question-answer pairs for road reasoning.

02

Training small models with CRS-enriched scenes improves compositional reasoning performance.

03

CRS-trained models show reduced failure modes, mainly limited to attribute recognition.

Abstract

Structured road understanding of lane geometry, topology, and traffic element relationships is foundational to safe autonomous driving. While vision-language models (VLMs) offer promising semantic flexibility, they lack the geometric and relational grounding required for precise road reasoning. Conversely, traditional modular systems, e.g., HD maps and topological road graphs, provide structural precision but remain semantically rigid. To bridge this gap, we introduce the Combined Road Substrate (CRS), a graph-grounded framework that makes geometric road structure and open-vocabulary semantics jointly executable in a single representation. CRS enables the automatic generation of compositionally complex and linguistically varied question-answer pairs via recursive graph queries, augmented with a "grounding for free" mechanism that ensures logical traceability to specific map elements,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.