Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View
Jiayu Yang, Enze Xie, Miaomiao Liu, Jose M. Alvarez

TL;DR
This paper introduces a parametric depth-based feature transformation method for bird's-eye-view perception in autonomous driving, improving object detection and segmentation by leveraging geometry and visibility information.
Contribution
We propose a novel parametric depth distribution modeling approach for BEV feature transformation, addressing memory issues and hallucination problems in perception tasks.
Findings
Outperforms existing methods on nuScenes dataset for detection and segmentation.
Provides reliable visibility-aware estimations to reduce hallucination.
Enhances downstream perception tasks with geometry-informed features.
Abstract
Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works rely on non-parametric depth distribution modeling leading to significant memory consumption, or ignore the geometry information to address this problem. In contrast, we propose to use parametric depth distribution modeling for feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. Then, we aggregate the 3D feature volume based on the 3D space occupancy derived from depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
