PETR: Position Embedding Transformation for Multi-View 3D Object   Detection

Yingfei Liu; Tiancai Wang; Xiangyu Zhang; Jian Sun

arXiv:2203.05625·cs.CV·July 20, 2022

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

Yingfei Liu, Tiancai Wang, Xiangyu Zhang, Jian Sun

PDF

1 Repo

TL;DR

PETR introduces a novel position embedding transformation method that encodes 3D coordinate information into image features, enabling end-to-end multi-view 3D object detection with state-of-the-art results on nuScenes.

Contribution

The paper presents PETR, a new approach that integrates 3D position information into image features for improved multi-view 3D object detection.

Findings

01

Achieves 50.4% NDS and 44.1% mAP on nuScenes

02

Ranks 1st on the benchmark

03

Serves as a strong baseline for future research

Abstract

In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research. Code is available at \url{https://github.com/megvii-research/PETR}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

megvii-research/petr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.