Bridging the Modality Gap in Roadside LiDAR: A Training-Free Vision-Language Model Framework for Vehicle Classification

Yiqiao Li; Bo Shang; Jie Wei

arXiv:2602.09425·cs.CV·February 11, 2026

Bridging the Modality Gap in Roadside LiDAR: A Training-Free Vision-Language Model Framework for Vehicle Classification

Yiqiao Li, Bo Shang, Jie Wei

PDF

Open Access

TL;DR

This paper introduces a training-free framework that adapts vision-language models to classify vehicles from roadside LiDAR data by converting sparse 3D scans into depth-encoded 2D images, enabling scalable, few-shot vehicle classification.

Contribution

It presents a novel depth-aware image generation pipeline and demonstrates effective vehicle classification without fine-tuning VLMs, reducing manual labeling efforts in ITS applications.

Findings

01

Achieves over 75% accuracy in classifying specific vehicle categories with minimal examples.

02

Effectively uses VLMs for ultra-low-shot classification, especially with fewer than 4 examples.

03

Provides a scalable, training-free approach suitable for real-world ITS deployment.

Abstract

Fine-grained truck classification is critical for intelligent transportation systems (ITS), yet current LiDAR-based methods face scalability challenges due to their reliance on supervised deep learning and labor-intensive manual annotation. Vision-Language Models (VLMs) offer promising few-shot generalization, but their application to roadside LiDAR is limited by a modality gap between sparse 3D point clouds and dense 2D imagery. We propose a framework that bridges this gap by adapting off-the-shelf VLMs for fine-grained truck classification without parameter fine-tuning. Our new depth-aware image generation pipeline applies noise removal, spatial and temporal registration, orientation rectification, morphological operations, and anisotropic smoothing to transform sparse, occluded LiDAR scans into depth-encoded 2D visual proxies. Validated on a real-world dataset of 20 vehicle classes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Robotics and Sensor-Based Localization