Efficient and Robust 2D-to-BEV Representation Learning via   Geometry-guided Kernel Transformer

Shaoyu Chen; Tianheng Cheng; Xinggang Wang; Wenming Meng and; Qian Zhang; Wenyu Liu

arXiv:2206.04584·cs.CV·June 10, 2022·28 cites

Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer

Shaoyu Chen, Tianheng Cheng, Xinggang Wang, Wenming Meng and, Qian Zhang, Wenyu Liu

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a Geometry-guided Kernel Transformer (GKT) for efficient, robust 2D-to-BEV representation learning from surround-view cameras, achieving real-time performance and state-of-the-art segmentation accuracy for autonomous driving.

Contribution

The paper proposes a novel GKT mechanism that incorporates geometric priors and a LUT indexing method, enabling fast, robust, and accurate BEV perception without camera calibration at runtime.

Findings

01

GKT runs at 72.3 FPS on 3090 GPU and 45.6 FPS on 2080ti GPU.

02

Achieves 38.0 mIoU on nuScenes validation set.

03

Demonstrates robustness to camera deviations and predefined BEV height.

Abstract

Learning Bird's Eye View (BEV) representation from surrounding-view cameras is of great importance for autonomous driving. In this work, we propose a Geometry-guided Kernel Transformer (GKT), a novel 2D-to-BEV representation learning mechanism. GKT leverages the geometric priors to guide the transformer to focus on discriminative regions and unfolds kernel features to generate BEV representation. For fast inference, we further introduce a look-up table (LUT) indexing method to get rid of the camera's calibrated parameters at runtime. GKT can run at $72.3$ FPS on 3090 GPU / $45.6$ FPS on 2080ti GPU and is robust to the camera deviation and the predefined BEV height. And GKT achieves the state-of-the-art real-time segmentation results, i.e., 38.0 mIoU (100m $\times$ 100m perception range at a 0.5m resolution) on the nuScenes val set. Given the efficiency, effectiveness, and robustness, GKT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hustvl/gkt
pytorchOfficial

Models

🤗
qualcomm/GKT
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Label Smoothing · Softmax · Byte Pair Encoding · Adam · Dropout · Residual Connection