FSSD: Feature Fusion Single Shot Multibox Detector
Zuoxin Li, Lu Yang, Fuqiang Zhou

TL;DR
FSSD introduces a lightweight feature fusion module to enhance SSD's object detection accuracy by effectively combining multi-scale features, resulting in improved performance with minimal speed reduction.
Contribution
The paper proposes a novel feature fusion module for SSD that significantly boosts detection accuracy while maintaining high speed, addressing the challenge of fusing multi-scale features.
Findings
Achieves 82.7 mAP on Pascal VOC 2007 at 65.8 FPS
Outperforms conventional SSD on COCO dataset
Improves accuracy with minimal speed loss
Abstract
SSD (Single Shot Multibox Detector) is one of the best object detection algorithms with both high accuracy and fast speed. However, SSD's feature pyramid detection method makes it hard to fuse the features from different scales. In this paper, we proposed FSSD (Feature Fusion Single Shot Multibox Detector), an enhanced SSD with a novel and lightweight feature fusion module which can improve the performance significantly over SSD with just a little speed drop. In the feature fusion module, features from different layers with different scales are concatenated together, followed by some down-sampling blocks to generate new feature pyramid, which will be fed to multibox detectors to predict the final detection results. On the Pascal VOC 2007 test, our network can achieve 82.7 mAP (mean average precision) at the speed of 65.8 FPS (frame per second) with the input size 300300 using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution · Non Maximum Suppression · 1x1 Convolution · SSD
