Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

Nikos Theodoridis; Reenu Mohandas; Ganesh Sistu; Anthony Scanlan; Ciar\'an Eising; Tim Brophy

arXiv:2603.06054·cs.CV·March 9, 2026

Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

Nikos Theodoridis, Reenu Mohandas, Ganesh Sistu, Anthony Scanlan, Ciar\'an Eising, Tim Brophy

PDF

Open Access

TL;DR

This study investigates how vision-language models encode visual concepts relevant to automated driving, revealing their strengths and limitations in representing and reasoning about visual information.

Contribution

The paper introduces a method to analyze the linear encoding of visual concepts in VLMs and identifies specific failure modes affecting their performance in automated driving scenarios.

Findings

01

Object presence is explicitly linearly encoded in VLMs.

02

Spatial concepts like orientation are implicitly encoded.

03

Increasing object distance reduces linear separability of concepts.

Abstract

The use of Vision-Language Models (VLMs) in automated driving applications is becoming increasingly common, with the aim of leveraging their reasoning and generalisation capabilities to handle long tail scenarios. However, these models often fail on simple visual questions that are highly relevant to automated driving, and the reasons behind these failures remain poorly understood. In this work, we examine the intermediate activations of VLMs and assess the extent to which specific visual concepts are linearly encoded, with the goal of identifying bottlenecks in the flow of visual information. Specifically, we create counterfactual image sets that differ only in a targeted visual concept and then train linear probes to distinguish between them using the activations of four state-of-the-art (SOTA) VLMs. Our results show that concepts such as the presence of an object or agent in a scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Visual Attention and Saliency Detection