Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
Mohammad Javad Shafiee, Brendan Chywl, Francis Li, and Alexander Wong

TL;DR
Fast YOLO is a framework that accelerates YOLOv2 for real-time object detection on embedded devices by optimizing network architecture and introducing motion-adaptive inference, achieving significant speedups and reduced power consumption.
Contribution
The paper introduces Fast YOLO, which combines architecture optimization and motion-adaptive inference to enable real-time object detection on resource-limited embedded systems.
Findings
Achieves ~3.3X speedup over YOLOv2
Reduces deep inferences by 38.13% on average
Runs at ~18FPS on Nvidia Jetson TX1
Abstract
Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · CCD and CMOS Imaging Sensors
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Average Pooling · Global Average Pooling · 1x1 Convolution · Batch Normalization · Max Pooling · Softmax · Convolution · Darknet-19 · YOLOv2
