Trainable Bitwise Soft Quantization for Input Feature Compression
Karsten Schr\"odter, Jan Stenkamp, Nina Herrmann, Fabian Gieseke

TL;DR
This paper introduces a trainable, bitwise soft quantization layer that compresses input features for neural networks, reducing data transfer needs in IoT applications while maintaining high accuracy.
Contribution
It proposes a novel trainable quantization method using sigmoid-approximated step functions for efficient feature compression in neural networks.
Findings
Achieves 5x to 16x data compression with minimal accuracy loss.
Outperforms standard quantization baselines in experiments.
Maintains accuracy close to full-precision models.
Abstract
The growing demand for machine learning applications in the context of the Internet of Things calls for new approaches to optimize the use of limited compute and memory resources. Despite significant progress that has been made w.r.t. reducing model sizes and improving efficiency, many applications still require remote servers to provide the required resources. However, such approaches rely on transmitting data from edge devices to remote servers, which may not always be feasible due to bandwidth, latency, or energy constraints. We propose a task-specific, trainable feature quantization layer that compresses the input features of a neural network. This can significantly reduce the amount of data that needs to be transferred from the device to a remote server. In particular, the layer allows each input feature to be quantized to a user-defined number of bits, enabling a simple on-device…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
