Bandwidth-efficient Inference for Neural Image Compression

Shanzhi Yin; Tongda Xu; Yongsheng Liang; Yuanyuan Wang; Yanghao Li,; Yan Wang; Jingjing Liu

arXiv:2309.02855·cs.CV·September 8, 2023·1 cites

Bandwidth-efficient Inference for Neural Image Compression

Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li,, Yan Wang, Jingjing Liu

PDF

Open Access

TL;DR

This paper introduces a neural network inference method that compresses activations to significantly reduce bandwidth and energy consumption, enabling more efficient image compression on mobile and edge devices.

Contribution

It presents a novel end-to-end differentiable activation compression pipeline using transform, quantization, and entropy coding, optimized for neural image compression tasks.

Findings

01

Achieves up to 19x bandwidth reduction

02

Realizes 6.21x energy savings

03

Maintains high image compression performance

Abstract

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Neural Networks and Applications · Advanced Data Compression Techniques