DEEPEYE: A Compact and Accurate Video Comprehension at Terminal Devices Compressed with Quantization and Tensorization
Yuan Cheng, Guangya Li, Hai-Bao Chen, Sheldon X.-D. Tan, Hao Yu

TL;DR
DEEPEYE is a compact, accurate video comprehension system for terminal devices that combines quantization and tensorization techniques to significantly reduce model size and computation while maintaining high accuracy.
Contribution
The paper introduces a novel integrated approach using quantization and tensorization for efficient video detection and recognition on resource-constrained devices.
Findings
Achieves 3.994x model compression with only 0.47% mAP loss
Reduces parameters by 15,047x and speeds up by 2.87x
Improves accuracy by 16.58% on benchmark datasets
Abstract
As it requires a huge number of parameters when exposed to high dimensional inputs in video detection and classification, there is a grand challenge to develop a compact yet accurate video comprehension at terminal devices. Current works focus on optimizations of video detection and classification in a separated fashion. In this paper, we introduce a video comprehension (object detection and action recognition) system for terminal devices, namely DEEPEYE. Based on You Only Look Once (YOLO), we have developed an 8-bit quantization method when training YOLO; and also developed a tensorized-compression method of Recurrent Neural Network (RNN) composed of features extracted from YOLO. The developed quantization and tensorization can significantly compress the original network model yet with maintained accuracy. Using the challenging video datasets: MOMENTS and UCF11 as benchmarks, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Multimodal Machine Learning Applications
