Oriented Object Detection with Transformer
Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin, Han, Errui Ding, Baochang Zhang, David Doermann

TL;DR
This paper introduces O2DETR, a Transformer-based method for oriented object detection that simplifies the process and improves accuracy, achieving state-of-the-art results on the DOTA dataset.
Contribution
The paper presents a novel end-to-end oriented object detection framework using Transformers with a new efficient encoder, significantly reducing computational costs and setting new benchmarks.
Findings
O2DETR achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet.
The proposed encoder reduces memory and computational costs.
O2DETR performs competitively on the DOTA dataset.
Abstract
Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer () based on an end-to-end network. The contributions of include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Video Surveillance and Tracking Methods
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Feature Pyramid Network · Absolute Position Encodings · Position-Wise Feed-Forward Layer · 1x1 Convolution · Byte Pair Encoding · Adam · RoIPool
