Can the Query-based Object Detector Be Designed with Fewer Stages?
Jialin Li, Weifu Fu, Yuhuan Lin, Qiang Nie, Yong Liu

TL;DR
This paper introduces GOLO, a two-stage query-based object detector that reduces decoder stages to lower computational costs while maintaining high accuracy, challenging the traditional multi-stage paradigm.
Contribution
Proposes GOLO, a novel two-stage decoding model for query-based object detection, demonstrating comparable performance with fewer stages and reduced computational burden.
Findings
GOLO achieves competitive accuracy on COCO dataset.
Fewer decoder stages lead to lower computational costs.
Two-stage decoding suffices for high-performance detection.
Abstract
Query-based object detectors have made significant advancements since the publication of DETR. However, most existing methods still rely on multi-stage encoders and decoders, or a combination of both. Despite achieving high accuracy, the multi-stage paradigm (typically consisting of 6 stages) suffers from issues such as heavy computational burden, prompting us to reconsider its necessity. In this paper, we explore multiple techniques to enhance query-based detectors and, based on these findings, propose a novel model called GOLO (Global Once and Local Once), which follows a two-stage decoding paradigm. Compared to other mainstream query-based models with multi-stage decoders, our model employs fewer decoder stages while still achieving considerable performance. Experimental results on the COCO dataset demonstrate the effectiveness of our approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
