From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?

Krishna Kanth Nakka; Vedasri Nakka

arXiv:2602.10771·cs.CV·February 12, 2026

From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?

Krishna Kanth Nakka, Vedasri Nakka

PDF

Open Access 1 Datasets

TL;DR

This paper introduces CyclingVQA, a benchmark for evaluating vision-language models on cyclist-centric perception and reasoning, revealing strengths and gaps in current models for cyclist-assistive traffic understanding.

Contribution

The paper presents CyclingVQA, a new diagnostic benchmark for cyclist-centric perception and reasoning, and evaluates 31+ VLMs to identify strengths and limitations in cyclist-assistive applications.

Findings

01

Current models show promising capabilities but need improvement in cyclist-specific cues.

02

Several models underperform in cyclist-assistive scenarios compared to general-purpose models.

03

Error analysis highlights key failure modes to guide future development.

Abstract

Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks, suggesting their potential for general traffic understanding and navigation-related reasoning. However, existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint. To address this gap, we introduce CyclingVQA, a diagnostic benchmark designed to probe perception, spatio-temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective. Evaluating 31+ recent VLMs spanning general-purpose, spatially enhanced, and autonomous-driving-specialized models, we find that current models demonstrate encouraging capabilities, while also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

KKNakka/CyclingVQA
dataset· 4 dl
4 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Human-Automation Interaction and Safety · Multimodal Machine Learning Applications