Bounding Box Regression with Uncertainty for Accurate Object Detection
Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, Xiangyu Zhang

TL;DR
This paper introduces a novel bounding box regression loss that models uncertainty, improving localization accuracy in object detection without significant computational overhead, and enhances performance on MS-COCO benchmarks.
Contribution
It proposes a new loss function that jointly learns bounding box transformations and localization variance, enabling better bounding box refinement and merging during NMS.
Findings
Boosts VGG-16 Faster R-CNN AP from 23.6% to 29.1%.
Improves ResNet-50-FPN Mask R-CNN AP by 1.8%.
Significantly outperforms previous bounding box refinement methods.
Abstract
Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together. Our loss greatly improves the localization accuracies of various architectures with nearly no additional computation. The learned localization variance allows us to merge neighboring bounding boxes during non-maximum suppression (NMS), which further improves the localization performance. On MS-COCO, we boost the Average Precision (AP) of VGG-16 Faster R-CNN from 23.6% to 29.1%. More importantly, for ResNet-50-FPN Mask R-CNN, our method improves the AP and AP90 by 1.8% and 6.2% respectively, which significantly outperforms previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsRoIAlign · Average Pooling · Mask R-CNN · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block
