CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse   Transformers

Runsheng Xu; Zhengzhong Tu; Hao Xiang; Wei Shao; Bolei Zhou; Jiaqi Ma

arXiv:2207.02202·cs.CV·September 27, 2022·78 cites

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma

PDF

Open Access 2 Repos

TL;DR

CoBEVT introduces a cooperative multi-agent framework using sparse Transformers for enhanced bird's eye view semantic segmentation, significantly improving perception accuracy and range in autonomous driving scenarios.

Contribution

It is the first to propose a multi-agent, multi-camera perception framework with a novel fused axial attention module for cooperative BEV map prediction.

Findings

01

Achieves state-of-the-art performance on V2V perception dataset OPV2V.

02

Demonstrates generalizability to single-agent BEV segmentation and multi-agent 3D detection.

03

Operates with real-time inference speed.

Abstract

Bird's eye view (BEV) semantic segmentation plays a crucial role in spatial sensing for autonomous driving. Although recent literature has made significant progress on BEV map understanding, they are all based on single-agent camera-based systems. These solutions sometimes have difficulty handling occlusions or detecting distant objects in complex traffic scenes. Vehicle-to-Vehicle (V2V) communication technologies have enabled autonomous vehicles to share sensing information, dramatically improving the perception performance and range compared to single-agent systems. In this paper, we propose CoBEVT, the first generic multi-agent multi-camera perception framework that can cooperatively generate BEV map predictions. To efficiently fuse camera features from multi-view and multi-agent data in an underlying Transformer architecture, we design a fused axial attention module (FAX), which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Robotics and Sensor-Based Localization

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing