DQ3D: Depth-guided Query for Transformer-Based 3D Object Detection in Traffic Scenarios

Ziyu Wang; Wenhao Li; Ji Wu

arXiv:2510.23144·cs.CV·October 28, 2025

DQ3D: Depth-guided Query for Transformer-Based 3D Object Detection in Traffic Scenarios

Ziyu Wang, Wenhao Li, Ji Wu

PDF

TL;DR

This paper introduces DQ3D, a depth-guided query method for transformer-based 3D object detection in traffic scenarios, improving accuracy by leveraging depth info and historical data to reduce false positives.

Contribution

The paper presents a novel depth-guided query generator and a hybrid attention mechanism to enhance 3D object detection accuracy in traffic scenes.

Findings

01

Outperforms baseline by 6.3% mAP on nuScenes

02

Achieves 4.3% higher NDS score

03

Effectively reduces false positives and handles occlusions

Abstract

3D object detection from multi-view images in traffic scenarios has garnered significant attention in recent years. Many existing approaches rely on object queries that are generated from 3D reference points to localize objects. However, a limitation of these methods is that some reference points are often far from the target object, which can lead to false positive detections. In this paper, we propose a depth-guided query generator for 3D object detection (DQ3D) that leverages depth information and 2D detections to ensure that reference points are sampled from the surface or interior of the object. Furthermore, to address partially occluded objects in current frame, we introduce a hybrid attention mechanism that fuses historical detection results with depth-guided queries, thereby forming hybrid queries. Evaluation on the nuScenes dataset demonstrates that our method outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.