TL;DR
PETR introduces a novel position embedding transformation method that encodes 3D coordinate information into image features, enabling end-to-end multi-view 3D object detection with state-of-the-art results on nuScenes.
Contribution
The paper presents PETR, a new approach that integrates 3D position information into image features for improved multi-view 3D object detection.
Findings
Achieves 50.4% NDS and 44.1% mAP on nuScenes
Ranks 1st on the benchmark
Serves as a strong baseline for future research
Abstract
In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research. Code is available at \url{https://github.com/megvii-research/PETR}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
