H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection
Xue Yang, Gefan Zhang, Wentong Li, Xuehui Wang, Yue Zhou, Junchi Yan

TL;DR
H2RBox is a novel oriented object detection method that uses only horizontal box annotations with weakly- and self-supervised learning to predict object angles, achieving performance comparable to rotated box methods.
Contribution
This paper introduces H2RBox, the first oriented object detector trained solely with horizontal box annotations, bridging the gap between available data and oriented detection needs.
Findings
H2RBox achieves performance close to rotated box-supervised detectors.
H2RBox outperforms horizontal box-supervised instance segmentation methods in speed and robustness.
H2RBox requires less memory and is more efficient in complex scenes.
Abstract
Oriented object detection emerges in many applications from aerial images to autonomous driving, while many existing detection benchmarks are annotated with horizontal bounding box only which is also less costive than fine-grained rotated box, leading to a gap between the readily available training corpus and the rising demand for oriented object detection. This paper proposes a simple yet effective oriented object detection approach called H2RBox merely using horizontal box annotation for weakly-supervised training, which closes the above gap and shows competitive performance even against those trained with rotated boxes. The cores of our method are weakly- and self-supervised learning, which predicts the angle of the object by learning the consistency of two different views. To our best knowledge, H2RBox is the first horizontal box annotation-based oriented object detector. Compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
