Parametric Depth Based Feature Representation Learning for Object   Detection and Segmentation in Bird's Eye View

Jiayu Yang; Enze Xie; Miaomiao Liu; Jose M. Alvarez

arXiv:2307.04106·cs.CV·July 13, 2023·1 cites

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View

Jiayu Yang, Enze Xie, Miaomiao Liu, Jose M. Alvarez

PDF

Open Access 1 Repo

TL;DR

This paper introduces a parametric depth-based feature transformation method for bird's-eye-view perception in autonomous driving, improving object detection and segmentation by leveraging geometry and visibility information.

Contribution

We propose a novel parametric depth distribution modeling approach for BEV feature transformation, addressing memory issues and hallucination problems in perception tasks.

Findings

01

Outperforms existing methods on nuScenes dataset for detection and segmentation.

02

Provides reliable visibility-aware estimations to reduce hallucination.

03

Enhances downstream perception tasks with geometry-informed features.

Abstract

Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works rely on non-parametric depth distribution modeling leading to significant memory consumption, or ignore the geometry information to address this problem. In contrast, we propose to use parametric depth distribution modeling for feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. Then, we aggregate the 3D feature volume based on the 3D space occupancy derived from depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nvlabs/parametricbev
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques