TL;DR
This paper presents a safety-guided quantization framework for neural networks that reduces model size by 60% while improving accuracy, applicable across different architectures, and validated through extensive experiments.
Contribution
Introduces a novel safety-driven quantization method using preservation sets to optimize neural network models without accuracy loss.
Findings
Achieves up to 2.5% accuracy improvement over original models.
Reduces model size by 60% while maintaining performance.
Enhances generalization and reduces variance compared to traditional quantization.
Abstract
The deployment of deep neural networks on resource-constrained devices necessitates effective model com- pression strategies that judiciously balance the reduction of model size with the preservation of performance. This study introduces a novel safety-driven quantization framework that leverages preservation sets to systematically prune and quantize neural network weights, thereby optimizing model complexity without compromising accuracy. The proposed methodology is rigorously evaluated on both a convolutional neural network (CNN) and an attention-based language model, demonstrating its applicability across diverse architectural paradigms. Experimental results reveal that our framework achieves up to a 2.5% enhancement in test accuracy relative to the original unquantized models while maintaining 60% of the initial model size. In comparison to conventional quantization techniques, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
