Quantization Aware Factorization for Deep Neural Network Compression
Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets,, Andrzej Cichocki, Julia Gusak

TL;DR
This paper introduces a novel tensor decomposition method that integrates quantization during the factorization process, enabling effective neural network compression with minimal accuracy loss, suitable for resource-constrained devices.
Contribution
It develops an ADMM-based algorithm for quantized tensor factorization, combining compression and quantization in a unified framework for neural networks.
Findings
Achieves competitive accuracy with state-of-the-art quantization methods.
Demonstrates high flexibility in balancing quality and performance.
Reduces model size and computational cost effectively.
Abstract
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Advanced Neural Network Applications · Computational Physics and Python Applications
