Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

Yiren Lu; Xin Ye; Burhaneddin Yaman; Jingru Luo; Zhexiao Xiong; Liu Ren; Yu Yin

arXiv:2603.19193·cs.CV·March 20, 2026

Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

Yiren Lu, Xin Ye, Burhaneddin Yaman, Jingru Luo, Zhexiao Xiong, Liu Ren, Yu Yin

PDF

Open Access

TL;DR

This paper introduces Splat2BEV, a framework that enhances Bird's-Eye-View perception for autonomous driving by explicitly reconstructing 3D scenes with Gaussian Splatting, leading to more accurate and interpretable BEV representations.

Contribution

It proposes a novel approach that explicitly reconstructs 3D scenes using Gaussian Splatting to improve BEV perception accuracy and interpretability.

Findings

01

Achieves state-of-the-art performance on nuScenes and argoverse datasets.

02

Demonstrates the effectiveness of explicit 3D reconstruction for BEV tasks.

03

Validates the importance of geometry-aligned features in perception accuracy.

Abstract

Bird's-Eye-View (BEV) perception serves as a cornerstone for autonomous driving, offering a unified spatial representation that fuses surrounding-view images to enable reasoning for various downstream tasks, such as semantic segmentation, 3D object detection, and motion prediction. However, most existing BEV perception frameworks adopt an end-to-end training paradigm, where image features are directly transformed into the BEV space and optimized solely through downstream task supervision. This formulation treats the entire perception process as a black box, often lacking explicit 3D geometric understanding and interpretability, leading to suboptimal performance. In this paper, we claim that an explicit 3D representation matters for accurate BEV perception, and we propose Splat2BEV, a Gaussian Splatting-assisted framework for BEV tasks. Splat2BEV aims to learn BEV feature representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection