Bandwidth-efficient Inference for Neural Image Compression
Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li,, Yan Wang, Jingjing Liu

TL;DR
This paper introduces a neural network inference method that compresses activations to significantly reduce bandwidth and energy consumption, enabling more efficient image compression on mobile and edge devices.
Contribution
It presents a novel end-to-end differentiable activation compression pipeline using transform, quantization, and entropy coding, optimized for neural image compression tasks.
Findings
Achieves up to 19x bandwidth reduction
Realizes 6.21x energy savings
Maintains high image compression performance
Abstract
With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Neural Networks and Applications · Advanced Data Compression Techniques
