InstanceBEV: Unifying Instance and BEV Representation for 3D Panoptic Segmentation

Feng Li; Zhaoyue Wang; Enyuan Zhang; Mohammad Masum Billah; Yunduan Cui; Kun Xu

arXiv:2505.13817·cs.CV·September 24, 2025

InstanceBEV: Unifying Instance and BEV Representation for 3D Panoptic Segmentation

Feng Li, Zhaoyue Wang, Enyuan Zhang, Mohammad Masum Billah, Yunduan Cui, Kun Xu

PDF

Open Access

TL;DR

InstanceBEV unifies instance and BEV representations to improve 3D panoptic segmentation efficiency and accuracy in autonomous driving, enabling effective multi-task learning with only 8 frames.

Contribution

The paper introduces InstanceBEV, a novel approach that combines map-centric and object-centric methods to enhance 3D perception in BEV space, addressing efficiency and integration challenges.

Findings

01

Achieves 15.3 RayPQ and 38.2 RayIoU on OCC3D-nuScenes with 8 frames.

02

Outperforms SparseOcc by 9.3% in RayPQ and 10.7% in RayIoU.

03

Enables multi-task learning without additional modules.

Abstract

BEV-based 3D perception has emerged as a focal point of research in end-to-end autonomous driving. However, existing BEV approaches encounter significant challenges due to the large feature space, complicating efficient modeling and hindering effective integration of global attention mechanisms. We propose a novel modeling strategy, called InstanceBEV, that synergistically combines the strengths of both map-centric approaches and object-centric approaches. Our method effectively extracts instance-level features within the BEV features, facilitating the implementation of global attention modeling in a highly compressed feature space, thereby addressing the efficiency challenges inherent in map-centric global modeling. Furthermore, our approach enables effective multi-task learning without introducing additional module. We validate the efficiency and accuracy of the proposed model through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis