ResQ: Residual Quantization for Video Perception
Davide Abati, Haitam Ben Yahia, Markus Nagel, Amirhossein Habibian

TL;DR
ResQ introduces a novel residual quantization method that leverages temporal redundancies in video to improve the efficiency and accuracy of video perception tasks like segmentation and pose estimation.
Contribution
The paper proposes Residual Quantization (ResQ), a new low-bit quantization scheme that incorporates temporal dependencies for enhanced video perception performance.
Findings
ResQ outperforms standard quantization methods in accuracy vs. bit-width trade-offs.
Dynamic bit-width adjustment improves efficiency based on video changes.
ResQ achieves superior results on semantic segmentation and pose estimation benchmarks.
Abstract
This paper accelerates video perception, such as semantic segmentation and human pose estimation, by levering cross-frame redundancies. Unlike the existing approaches, which avoid redundant computations by warping the past features using optical-flow or by performing sparse convolutions on frame differences, we approach the problem from a new perspective: low-bit quantization. We observe that residuals, as the difference in network activations between two neighboring frames, exhibit properties that make them highly quantizable. Based on this observation, we propose a novel quantization scheme for video networks coined as Residual Quantization. ResQ extends the standard, frame-by-frame, quantization scheme by incorporating temporal dependencies that lead to better performance in terms of accuracy vs. bit-width. Furthermore, we extend our model to dynamically adjust the bit-width…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Vision and Imaging · Visual Attention and Saliency Detection
MethodsSparse Convolutions
