Speed/accuracy trade-offs for modern convolutional object detectors

Jonathan Huang; Vivek Rathod; Chen Sun; Menglong Zhu; Anoop; Korattikara; Alireza Fathi; Ian Fischer; Zbigniew Wojna; Yang Song; Sergio; Guadarrama; Kevin Murphy

arXiv:1611.10012·cs.CV·April 26, 2017·157 cites

Speed/accuracy trade-offs for modern convolutional object detectors

Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop, Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio, Guadarrama, Kevin Murphy

PDF

Open Access 5 Repos 3 Models

TL;DR

This paper systematically compares modern convolutional object detectors, analyzing how different architectures, feature extractors, and parameters affect the speed, accuracy, and memory usage to guide optimal selection for various applications.

Contribution

It provides a unified implementation of key detection architectures and maps out their speed-accuracy trade-offs, enabling better informed choices for different deployment scenarios.

Findings

01

Real-time detector suitable for mobile devices.

02

State-of-the-art accuracy achieved on COCO dataset.

03

Trade-off curves illustrating performance across architectures.

Abstract

The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Ethereum Customer Service Number +1-833-534-1729 · Non Maximum Suppression · 1x1 Convolution · SSD · Region Proposal Network