BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

Thomas Monninger; Shaoyuan Xie; Qi Alfred Chen; Sihao Ding

arXiv:2603.06576·cs.CV·March 9, 2026

BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

Thomas Monninger, Shaoyuan Xie, Qi Alfred Chen, Sihao Ding

PDF

Open Access

TL;DR

BEVLM introduces a novel framework that combines spatially consistent Bird's-Eye View representations with Large Language Models, enhancing semantic reasoning and driving performance in autonomous systems.

Contribution

This work bridges the gap between BEV spatial representations and LLMs by distilling semantic knowledge, enabling more effective multi-view reasoning in autonomous driving.

Findings

01

LLMs with BEV features improve reasoning accuracy by 46%.

02

Distilling LLM knowledge into BEV enhances safety in critical scenarios by 29%.

03

BEVLM achieves better spatial and semantic integration for autonomous driving.

Abstract

The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handling complex decision-making and long-tail scenarios. However, existing methods typically feed LLMs with tokens from multi-view and multi-frame images independently, leading to redundant computation and limited spatial consistency. This separation in visual processing hinders accurate 3D spatial reasoning and fails to maintain geometric coherence across views. On the other hand, Bird's-Eye View (BEV) representations learned from geometrically annotated tasks (e.g., object detection) provide spatial structure but lack the semantic richness of foundation vision encoders. To bridge this gap, we propose BEVLM, a framework that connects a spatially consistent and semantically distilled BEV…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning