FSSD: Feature Fusion Single Shot Multibox Detector

Zuoxin Li; Lu Yang; Fuqiang Zhou

arXiv:1712.00960·cs.CV·February 26, 2024·391 cites

FSSD: Feature Fusion Single Shot Multibox Detector

Zuoxin Li, Lu Yang, Fuqiang Zhou

PDF

Open Access 3 Repos

TL;DR

FSSD introduces a lightweight feature fusion module to enhance SSD's object detection accuracy by effectively combining multi-scale features, resulting in improved performance with minimal speed reduction.

Contribution

The paper proposes a novel feature fusion module for SSD that significantly boosts detection accuracy while maintaining high speed, addressing the challenge of fusing multi-scale features.

Findings

01

Achieves 82.7 mAP on Pascal VOC 2007 at 65.8 FPS

02

Outperforms conventional SSD on COCO dataset

03

Improves accuracy with minimal speed loss

Abstract

SSD (Single Shot Multibox Detector) is one of the best object detection algorithms with both high accuracy and fast speed. However, SSD's feature pyramid detection method makes it hard to fuse the features from different scales. In this paper, we proposed FSSD (Feature Fusion Single Shot Multibox Detector), an enhanced SSD with a novel and lightweight feature fusion module which can improve the performance significantly over SSD with just a little speed drop. In the feature fusion module, features from different layers with different scales are concatenated together, followed by some down-sampling blocks to generate new feature pyramid, which will be fed to multibox detectors to predict the final detection results. On the Pascal VOC 2007 test, our network can achieve 82.7 mAP (mean average precision) at the speed of 65.8 FPS (frame per second) with the input size 300 $\times$ 300 using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution · Non Maximum Suppression · 1x1 Convolution · SSD