Bridging the Modality Gap in Roadside LiDAR: A Training-Free Vision-Language Model Framework for Vehicle Classification
Yiqiao Li, Bo Shang, Jie Wei

TL;DR
This paper introduces a training-free framework that adapts vision-language models to classify vehicles from roadside LiDAR data by converting sparse 3D scans into depth-encoded 2D images, enabling scalable, few-shot vehicle classification.
Contribution
It presents a novel depth-aware image generation pipeline and demonstrates effective vehicle classification without fine-tuning VLMs, reducing manual labeling efforts in ITS applications.
Findings
Achieves over 75% accuracy in classifying specific vehicle categories with minimal examples.
Effectively uses VLMs for ultra-low-shot classification, especially with fewer than 4 examples.
Provides a scalable, training-free approach suitable for real-world ITS deployment.
Abstract
Fine-grained truck classification is critical for intelligent transportation systems (ITS), yet current LiDAR-based methods face scalability challenges due to their reliance on supervised deep learning and labor-intensive manual annotation. Vision-Language Models (VLMs) offer promising few-shot generalization, but their application to roadside LiDAR is limited by a modality gap between sparse 3D point clouds and dense 2D imagery. We propose a framework that bridges this gap by adapting off-the-shelf VLMs for fine-grained truck classification without parameter fine-tuning. Our new depth-aware image generation pipeline applies noise removal, spatial and temporal registration, orientation rectification, morphological operations, and anisotropic smoothing to transform sparse, occluded LiDAR scans into depth-encoded 2D visual proxies. Validated on a real-world dataset of 20 vehicle classes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Robotics and Sensor-Based Localization
