Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence
Robert A. Cohen, Hyomin Choi, Ivan V. Baji\'c

TL;DR
This paper introduces a lightweight, non-retraining compression method for intermediate neural network features in collaborative intelligence, significantly reducing data size with minimal accuracy loss.
Contribution
It proposes a novel quantization technique with mathematical modeling for optimal clipping, enabling efficient compression of intermediate features without retraining.
Findings
Compressed activations to 0.6-0.8 bits per value
Maintained less than 1% accuracy loss
Outperformed HEVC in inference accuracy
Abstract
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of ReLU and leaky-ReLU activations at this intermediate layer are developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
