Oriented Object Detection with Transformer

Teli Ma; Mingyuan Mao; Honghui Zheng; Peng Gao; Xiaodi Wang; Shumin; Han; Errui Ding; Baochang Zhang; David Doermann

arXiv:2106.03146·cs.CV·June 8, 2021·31 cites

Oriented Object Detection with Transformer

Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin, Han, Errui Ding, Baochang Zhang, David Doermann

PDF

Open Access

TL;DR

This paper introduces O2DETR, a Transformer-based method for oriented object detection that simplifies the process and improves accuracy, achieving state-of-the-art results on the DOTA dataset.

Contribution

The paper presents a novel end-to-end oriented object detection framework using Transformers with a new efficient encoder, significantly reducing computational costs and setting new benchmarks.

Findings

01

O2DETR achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet.

02

The proposed encoder reduces memory and computational costs.

03

O2DETR performs competitively on the DOTA dataset.

Abstract

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ( $O^{2} DETR$ ) based on an end-to-end network. The contributions of $O^{2} DETR$ include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Video Surveillance and Tracking Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Feature Pyramid Network · Absolute Position Encodings · Position-Wise Feed-Forward Layer · 1x1 Convolution · Byte Pair Encoding · Adam · RoIPool