Learning Spatial Fusion for Single-Shot Object Detection
Songtao Liu, Di Huang, Yunhong Wang

TL;DR
This paper introduces adaptively spatial feature fusion (ASFF), a data-driven method to improve scale-invariance in single-shot object detection by effectively fusing features across scales with minimal additional computational cost.
Contribution
It proposes a novel ASFF strategy that learns to spatially filter conflicting information, enhancing feature consistency across scales in single-shot detectors like YOLOv3.
Findings
Achieves 38.1% AP at 60 FPS on MS COCO
Improves scale-invariance of features in object detection
Maintains nearly free inference overhead
Abstract
Pyramidal feature representation is the common practice to address the challenge of scale variation in object detection. However, the inconsistency across different feature scales is a primary limitation for the single-shot detectors based on feature pyramid. In this work, we propose a novel and data driven strategy for pyramidal feature fusion, referred to as adaptively spatial feature fusion (ASFF). It learns the way to spatially filter conflictive information to suppress the inconsistency, thus improving the scale-invariance of features, and introduces nearly free inference overhead. With the ASFF strategy and a solid baseline of YOLOv3, we achieve the best speed-accuracy trade-off on the MS COCO dataset, reporting 38.1% AP at 60 FPS, 42.4% AP at 45 FPS and 43.9% AP at 29 FPS. The code is available at https://github.com/ruinmessi/ASFF
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
MethodsResidual Connection · Average Pooling · Logistic Regression · *Communicated@Fast*How Do I Communicate to Expedia? · k-Means Clustering · Softmax · 1x1 Convolution · Feature Pyramid Network · Max Pooling · Global Average Pooling
