Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection
Ping-Yang Chen, Jun-Wei Hsieh, Chien-Yao Wang, Hong-Yuan Mark Liao,, and Munkhjargal Gochoo

TL;DR
This paper introduces a residual bi-fusion feature pyramid network that enhances object detection accuracy across scales by effectively combining deep and shallow features in a bidirectional manner, outperforming existing methods.
Contribution
It proposes a novel residual bi-fusion feature pyramid that improves multi-scale detection accuracy and ease of training, especially with deeper backbones.
Findings
Achieved state-of-the-art results on VOC and MS COCO datasets.
Improved detection accuracy for both small and large objects.
Enhanced training stability with deeper network layers.
Abstract
State-of-the-art (SoTA) models have improved the accuracy of object detection with a large margin via a FP (feature pyramid). FP is a top-down aggregation to collect semantically strong features to improve scale invariance in both two-stage and one-stage detectors. However, this top-down pathway cannot preserve accurate object positions due to the shift-effect of pooling. Thus, the advantage of FP to improve detection accuracy will disappear when more layers are used. The original FP lacks a bottom-up pathway to offset the lost information from lower-layer feature maps. It performs well in large-sized object detection but poor in small-sized object detection. A new structure "residual feature pyramid" is proposed in this paper. It is bidirectional to fuse both deep and shallow features towards more effective and robust detection for both small-sized and large-sized objects. Due to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
