Speed/accuracy trade-offs for modern convolutional object detectors
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop, Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio, Guadarrama, Kevin Murphy

TL;DR
This paper systematically compares modern convolutional object detectors, analyzing how different architectures, feature extractors, and parameters affect the speed, accuracy, and memory usage to guide optimal selection for various applications.
Contribution
It provides a unified implementation of key detection architectures and maps out their speed-accuracy trade-offs, enabling better informed choices for different deployment scenarios.
Findings
Real-time detector suitable for mobile devices.
State-of-the-art accuracy achieved on COCO dataset.
Trade-off curves illustrating performance across architectures.
Abstract
The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Ethereum Customer Service Number +1-833-534-1729 · Non Maximum Suppression · 1x1 Convolution · SSD · Region Proposal Network
