BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous Driving
Karthik Mohan, Sonam Singh, Amit Arvind Kale

TL;DR
BeLLA is an end-to-end large language model system that integrates unified 360-degree bird's-eye view representations with multimodal reasoning for autonomous driving, significantly improving spatial reasoning and question answering capabilities.
Contribution
This work introduces BeLLA, a novel architecture that connects 360-degree BEV representations with large language models for autonomous driving, enhancing spatial reasoning and interpretability.
Findings
Outperforms existing methods on spatial reasoning questions by up to 9.3%
Achieves competitive performance across diverse question types
Demonstrates effective multi-view spatial reasoning in autonomous driving
Abstract
The rapid development of Vision-Language models (VLMs) and Multimodal Language Models (MLLMs) in autonomous driving research has significantly reshaped the landscape by enabling richer scene understanding, context-aware reasoning, and more interpretable decision-making. However, a lot of existing work often relies on either single-view encoders that fail to exploit the spatial structure of multi-camera systems or operate on aggregated multi-view features, which lack a unified spatial representation, making it more challenging to reason about ego-centric directions, object relations, and the wider context. We thus present BeLLA, an end-to-end architecture that connects unified 360{\deg} BEV representations with a large language model for question answering in autonomous driving. We primarily evaluate our work using two benchmarks - NuScenes-QA and DriveLM, where BeLLA consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
