BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in   Vision-based Roadside 3D Object Detection

Wenjie Wang; Yehao Lu; Guangcong Zheng; Shuigen Zhan; Xiaoqing Ye,; Zichang Tan; Jingdong Wang; Gaoang Wang; Xi Li

arXiv:2406.08785·cs.CV·June 14, 2024·1 cites

BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection

Wenjie Wang, Yehao Lu, Guangcong Zheng, Shuigen Zhan, Xiaoqing Ye,, Zichang Tan, Jingdong Wang, Gaoang Wang, Xi Li

PDF

Open Access 1 Repo

TL;DR

BEVSpread introduces a novel voxel pooling method for roadside 3D object detection that reduces position approximation errors by spreading features to surrounding grids, significantly improving detection accuracy.

Contribution

The paper proposes BEVSpread, a new voxel pooling strategy that enhances feature propagation in BEV representations by considering surrounding grid spreading with adaptive weights.

Findings

01

Significantly improves AP for vehicles, pedestrians, and cyclists.

02

Achieves comparable inference time with enhanced accuracy.

03

Effective as a plug-in for existing frustum-based BEV methods.

Abstract

Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we propose a novel voxel pooling strategy to reduce such error, dubbed BEVSpread. Specifically, instead of bringing the image features contained in a frustum point to a single BEV grid, BEVSpread considers each frustum point as a source and spreads the image features to the surrounding BEV grids with adaptive weights. To achieve superior propagation performance, a specific weight function is designed to dynamically control the decay speed of the weights according to distance and depth. Aided by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

datongjie/bevspread
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Robotics and Sensor-Based Localization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings