D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection with Transformers
Qiang Zhou, Chaohui Yu, Zhibin Wang, Fan Wang

TL;DR
D2Q-DETR introduces a transformer-based, end-to-end oriented object detection framework that decouples query features, employs dynamic queries, and improves label assignment, achieving state-of-the-art results on large aerial image datasets.
Contribution
The paper proposes a novel DETR-based framework with decoupled query features, dynamic query design, and improved label re-assignment for oriented object detection.
Findings
Outperforms existing methods on DOTA datasets
Reduces object queries without performance loss
Achieves state-of-the-art accuracy in aerial image detection
Abstract
Despite the promising results, existing oriented object detection methods usually involve heuristically designed rules, e.g., RRoI generation, rotated NMS. In this paper, we propose an end-to-end framework for oriented object detection, which simplifies the model pipeline and obtains superior performance. Our framework is based on DETR, with the box regression head replaced with a points prediction head. The learning of points is more flexible, and the distribution of points can reflect the angle and size of the target rotated box. We further propose to decouple the query features into classification and regression features, which significantly improves the model precision. Aerial images usually contain thousands of instances. To better balance model precision and efficiency, we propose a novel dynamic query design, which reduces the number of object queries in stacked decoder layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
