Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable
Lizhen Xu, Zehao Wu, Wenzhao Qiu, Shanmin Pang, Xiuxiu Bai, Kuizhi Mei, Jianru Xue

TL;DR
This paper introduces GPQ, a simple query pruning method for DETR-based 3D detection models that reduces redundant queries, accelerates inference, and decreases computational costs without sacrificing detection performance.
Contribution
The paper proposes a parameter-free, easy-to-implement query pruning technique called GPQ that can be applied as a fine-tuning step to improve efficiency of existing 3D detectors.
Findings
GPQ reduces redundant queries effectively.
Inference speed improves by up to 1.35x on desktops.
FLOPs and inference time decrease significantly on edge devices.
Abstract
Query-based models are extensively used in 3D object detection tasks, with a wide range of pre-trained checkpoints readily available online. However, despite their popularity, these models often require an excessive number of object queries, far surpassing the actual number of objects to detect. The redundant queries result in unnecessary computational and memory costs. In this paper, we find that not all queries contribute equally -- a significant portion of queries have a much smaller impact compared to others. Based on this observation, we propose an embarrassingly simple approach called Gradually Pruning Queries (GPQ), which prunes queries incrementally based on their classification scores. A key advantage of GPQ is that it requires no additional learnable parameters. It is straightforward to implement in any query-based method, as it can be seamlessly integrated as a fine-tuning…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. This paper is the first study to explore query pruning in query-based detectors. 2. The experiments demonstrate the effectiveness of proposed method, which reduces redundant queries while maintaining performance.
1. The experiment is incomplete and unpersuasive.
- This work proposes a simple and reasonable pruning method for transformer-based 3D detection models.
- Numerous typos are present, such as "3D objec detection" (L351), "each query as as the fundamental" (L264), "it is the queries" (L268), "The reason of why our method" (L300). The authors are encouraged to correct all typographical errors and improve their English writing to enhance readability. - Lack of comparison with prior work. This paper does not provide comparisons with previous pruning methods or implemented baseline pruning techniques. The authors are encouraged to include comparisons
+ The idea of using classification scores to prune queries is nice. + The paper is easy to understand. + The approach demonstrates promising results on the nuScenes validation set.
- An alternate way to reduce the processing time of 3D detectors is to use Token Merge [A]. How does query prune quantitatively compare against Token Merge [A]? - Another way to reduce processing time is to use model quantization. How does query pruning quantitatively compare against Model Quantization? - It would be beneficial to quantitatively include results from the nuScenes leaderboard, particularly comparing against a strong camera baseline like SparseBEV with 640x1600 resolution. - The ex
- The paper is well written and and illustrated. The figures help support the text i.e. Fig 1 and 3. - The approach is well motivated through experiment: Fig 2 shows that the imbalanced number of selections per query is prevalent across various DETR-based detectors. There indeed seem to be some queries that are used very often and others that are not. - The experiments are thorough for the proposed approach (but lack deeper/broader investigations see weaknesses). The approach is evaluated on fou
- The manuscript could have tried harder to investigate why training on a large set of queries and pruning them after is a better choice than training with fewer queries to begin with. The experiments show this (which is good), but the reader is left wondering about the why. - Which are the queries that get pruned? Is it that they attend to similar areas in space but always loose out against more confident queries? - If we took the top most confident queries (as per the proposed algori
1. The paper is well-structured, with a clear abstract, introduction, methodology, experiments, and conclusion sections that logically flow from one to the next. 2. The proposed Gradually Prunes Queries method is simple and effective. 3. The experiment results, when compared to different standard detectors, show that the design is effective.
1. There are some works aimed at speeding up query-based methods in 2D object detection, but this paper did not discuss. So, it is not clear if very similar methods exist in 2D object detection. [1] Yang, Chenhongyi, Zehao Huang, and Naiyan Wang. "QueryDet: Cascaded sparse query for accelerating high-resolution small object detection." Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 2022. [2] Zhu, Yuan, Qingyuan Xia, and Wen Jin. "Srdd: a lightweight end-to-en
Videos
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
