Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models
Aayush Saxena, Arit Kumar Bishwas, Ayush Ashok Mishra, Ryan Armstrong

TL;DR
This paper comprehensively evaluates and compares various model compression techniques like quantization and pruning across different deep learning models, including large language models, highlighting their impacts on size, accuracy, and inference time.
Contribution
It provides an extensive analysis of compression methods applied to diverse models, bridging traditional deep learning and large language models with practical performance insights.
Findings
Quantization and pruning significantly reduce model size and inference time.
Compression techniques can impact model accuracy variably depending on the method and model type.
Challenges remain in balancing compression efficiency and maintaining high accuracy.
Abstract
Deep learning models have achieved tremendous success in most of the industries in recent years. The evolution of these models has also led to an increase in the model size and energy requirement, making it difficult to deploy in production on low compute devices. An increase in the number of connected devices around the world warrants compressed models that can be easily deployed at the local devices with low compute capacity and power accessibility. A wide range of solutions have been proposed by different researchers to reduce the size and complexity of such models, prominent among them are, Weight Quantization, Parameter Pruning, Network Pruning, low-rank representation, weights sharing, neural architecture search, knowledge distillation etc. In this research work, we investigate the performance impacts on various trained deep learning models, compressed using quantization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · Knowledge Distillation
