Learning RoI Transformer for Detecting Oriented Objects in Aerial Images
Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu

TL;DR
This paper introduces a lightweight RoI Transformer that enhances oriented object detection in aerial images by transforming horizontal proposals into rotated ones and extracting rotation-invariant features, achieving state-of-the-art results.
Contribution
The paper proposes a novel RoI Transformer with a Rotated RoI learner and a rotation-invariant feature extraction module, improving detection accuracy without significant computational overhead.
Findings
Achieved state-of-the-art performance on DOTA and HRSC2016 datasets.
Outperformed deformable RoI pooling methods with oriented bounding boxes.
Maintained high detection speed while improving accuracy.
Abstract
Object detection in aerial images is an active yet challenging task in computer vision because of the birdview perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. Although rotated anchors have been used to tackle this problem, the design of them always multiplies the number of anchors and dramatically increases the computational complexity. In this paper, we propose a RoI Transformer to address these problems. More precisely, to improve the quality of region proposals, we first designed a Rotated RoI (RRoI) learner to transform a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
