RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via   Ray-Centric Strategies

Xiaomeng Chu; Jiajun Deng; Guoliang You; Yifan Duan; Yao Li; Yanyong; Zhang

arXiv:2407.14923·cs.CV·November 20, 2024

RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Yao Li, Yanyong, Zhang

PDF

TL;DR

RayFormer introduces a ray-centric approach to multi-camera 3D object detection, aligning query initialization and feature extraction with camera optics to improve detection accuracy and performance.

Contribution

The paper proposes a novel ray-inspired query initialization and feature extraction method that enhances multi-camera 3D detection accuracy by reducing feature ambiguity.

Findings

01

Achieves 55.5% mAP on nuScenes dataset

02

Attains 63.3% NDS, outperforming previous methods

03

Demonstrates effectiveness of ray-centric strategies in 3D detection

Abstract

The recent advances in query-based multi-camera 3D object detection are featured by initializing object queries in the 3D space, and then sampling features from perspective-view images to perform multi-round query refinement. In such a framework, query points near the same camera ray are likely to sample similar features from very close pixels, resulting in ambiguous query features and degraded detection accuracy. To this end, we introduce RayFormer, a camera-ray-inspired query-based 3D object detector that aligns the initialization and feature extraction of object queries with the optical characteristics of cameras. Specifically, RayFormer transforms perspective-view image features into bird's eye view (BEV) via the lift-splat-shoot method and segments the BEV map to sectors based on the camera rays. Object queries are uniformly and sparsely initialized along each camera ray,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.