AdaScale: Towards Real-time Video Object Detection Using Adaptive   Scaling

Ting-Wu Chin; Ruizhou Ding; Diana Marculescu

arXiv:1902.02910·cs.CV·February 11, 2019·39 cites

AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling

Ting-Wu Chin, Ruizhou Ding, Diana Marculescu

PDF

Open Access

TL;DR

AdaScale introduces an adaptive image scaling method that enhances both the speed and accuracy of video object detection in real-time systems by selecting optimal resolutions dynamically.

Contribution

The paper proposes a novel adaptive scaling approach that improves video object detection accuracy and speed simultaneously, challenging the traditional speed-accuracy trade-off.

Findings

01

Achieves up to 2.7 points mAP improvement on datasets.

02

Provides up to 1.8x speedup in detection.

03

Enhances state-of-the-art video acceleration by 1.25x.

Abstract

In vision-enabled autonomous systems such as robots and autonomous cars, video object detection plays a crucial role, and both its speed and accuracy are important factors to provide reliable operation. The key insight we show in this paper is that speed and accuracy are not necessarily a trade-off when it comes to image scaling. Our results show that re-scaling the image to a lower resolution will sometimes produce better accuracy. Based on this observation, we propose a novel approach, dubbed AdaScale, which adaptively selects the input image scale that improves both accuracy and speed for video object detection. To this end, our results on ImageNet VID and mini YouTube-BoundingBoxes datasets demonstrate 1.3 points and 2.7 points mAP improvement with 1.6x and 1.8x speedup, respectively. Additionally, we improve state-of-the-art video acceleration work by an extra 1.25x speedup with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Video Surveillance and Tracking Methods

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings