Lightweight compression of neural network feature tensors for collaborative intelligence
Robert A. Cohen, Hyomin Choi, Ivan V. Baji\'c

TL;DR
This paper introduces a lightweight, low-complexity compression method for neural network activations in collaborative intelligence, enabling efficient edge-cloud DNN split deployment with minimal accuracy loss.
Contribution
It proposes a novel compression technique for DNN activations that requires no retraining and is optimized for edge devices, outperforming traditional codecs like HEVC.
Findings
Achieved 0.6 to 0.8 bits per activation with less than 1% accuracy loss
Outperformed HEVC in inference accuracy by up to 1.3%
Applicable to popular object detection and classification DNNs
Abstract
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device, and the remainder of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer, while having a low complexity suitable for edge devices and not requiring any retraining. We also present a modified entropy-constrained quantizer design algorithm optimized for clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
