Looking Fast and Slow: Memory-Guided Mobile Video Object Detection
Mason Liu, Menglong Zhu, Marie White, Yinxiao Li, Dmitry Kalenichenko

TL;DR
This paper introduces a memory-guided approach for mobile video object detection that mimics human scene understanding, achieving high accuracy and speed by combining lightweight and conventional feature extractors with adaptive inference policies.
Contribution
It proposes a novel memory-based detection framework that reduces computation and improves accuracy on mobile devices, utilizing reinforcement learning for adaptive inference.
Findings
Achieves state-of-the-art accuracy among mobile methods on Imagenet VID 2015
Runs at over 70 FPS on a Pixel 3 phone
Uses minimal computation with scene gist recognition for efficient detection
Abstract
With a single eye fixation lasting a fraction of a second, the human visual system is capable of forming a rich representation of a complex environment, reaching a holistic understanding which facilitates object recognition and detection. This phenomenon is known as recognizing the "gist" of the scene and is accomplished by relying on relevant prior knowledge. This paper addresses the analogous question of whether using memory in computer vision systems can not only improve the accuracy of object detection in video streams, but also reduce the computation time. By interleaving conventional feature extractors with extremely lightweight ones which only need to recognize the gist of the scene, we show that minimal computation is required to produce accurate detections when temporal memory is present. In addition, we show that the memory contains enough information for deploying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection
