Fast YOLO: A Fast You Only Look Once System for Real-time Embedded   Object Detection in Video

Mohammad Javad Shafiee; Brendan Chywl; Francis Li; and Alexander Wong

arXiv:1709.05943·cs.CV·September 19, 2017·71 cites

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

Mohammad Javad Shafiee, Brendan Chywl, Francis Li, and Alexander Wong

PDF

Open Access 1 Repo

TL;DR

Fast YOLO is a framework that accelerates YOLOv2 for real-time object detection on embedded devices by optimizing network architecture and introducing motion-adaptive inference, achieving significant speedups and reduced power consumption.

Contribution

The paper introduces Fast YOLO, which combines architecture optimization and motion-adaptive inference to enable real-time object detection on resource-limited embedded systems.

Findings

01

Achieves ~3.3X speedup over YOLOv2

02

Reduces deep inferences by 38.13% on average

03

Runs at ~18FPS on Nvidia Jetson TX1

Abstract

Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Vonski/wdi19
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · CCD and CMOS Imaging Sensors

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Average Pooling · Global Average Pooling · 1x1 Convolution · Batch Normalization · Max Pooling · Softmax · Convolution · Darknet-19 · YOLOv2