RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Zhuofan Zong, Qianggang Cao, Biao Leng

TL;DR
RCNet introduces a novel architecture with reverse feature pyramid and cross-scale shift network to enhance multi-scale feature fusion in object detection, improving accuracy and efficiency over traditional FPN-based methods.
Contribution
The paper proposes RCNet, a new architecture that simplifies bidirectional feature fusion and propagates multi-scale features more effectively, outperforming existing methods with minimal computational overhead.
Findings
RCNet improves detection AP by up to 3.7 points over baseline.
RetinaNet with RCNet achieves 40.2 AP, surpassing previous models.
RCNet demonstrates strong performance on MS COCO and COCO test-dev datasets.
Abstract
Feature pyramid networks (FPN) are widely exploited for multi-scale feature fusion in existing advanced object detection frameworks. Numerous previous works have developed various structures for bidirectional feature fusion, all of which are shown to improve the detection performance effectively. We observe that these complicated network structures require feature pyramids to be stacked in a fixed order, which introduces longer pipelines and reduces the inference speed. Moreover, semantics from non-adjacent levels are diluted in the feature pyramid since only features at adjacent pyramid levels are merged by the local fusion operation in a sequence manner. To address these issues, we propose a novel architecture named RCNet, which consists of Reverse Feature Pyramid (RevFP) and Cross-scale Shift Network (CSN). RevFP utilizes local bidirectional feature fusion to simplify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFeature Pyramid Network · Convolution · 1x1 Convolution · Focal Loss · RetinaNet
