HeightFormer: Explicit Height Modeling without Extra Data for   Camera-only 3D Object Detection in Bird's Eye View

Yiming Wu; Ruixiang Li; Zequn Qin; Xinhai Zhao; Xi Li

arXiv:2307.13510·cs.CV·July 17, 2024

HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View

Yiming Wu, Ruixiang Li, Zequn Qin, Xinhai Zhao, Xi Li

PDF

Open Access

TL;DR

HeightFormer introduces an explicit height modeling approach in BEV space for camera-only 3D object detection, achieving state-of-the-art results without requiring additional data like LiDAR.

Contribution

The paper proposes HeightFormer, a novel method that explicitly models heights in BEV space, providing a theoretical proof of its equivalence to depth-based methods and demonstrating superior performance.

Findings

01

Achieves state-of-the-art performance among camera-only methods

02

Models heights and uncertainties in a self-recursive manner

03

No extra data like LiDAR needed for accurate height estimation

Abstract

Vision-based Bird's Eye View (BEV) representation is an emerging perception formulation for autonomous driving. The core challenge is to construct BEV space with multi-camera features, which is a one-to-many ill-posed problem. Diving into all previous BEV representation generation methods, we found that most of them fall into two types: modeling depths in image views or modeling heights in the BEV space, mostly in an implicit way. In this work, we propose to explicitly model heights in the BEV space, which needs no extra data like LiDAR and can fit arbitrary camera rigs and types compared to modeling depths. Theoretically, we give proof of the equivalence between height-based methods and depth-based methods. Considering the equivalence and some advantages of modeling heights, we propose HeightFormer, which models heights and uncertainties in a self-recursive way. Without any extra data,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques