Comprehensive Study on Performance Evaluation and Optimization of Model   Compression: Bridging Traditional Deep Learning and Large Language Models

Aayush Saxena; Arit Kumar Bishwas; Ayush Ashok Mishra; Ryan Armstrong

arXiv:2407.15904·cs.LG·July 24, 2024

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

Aayush Saxena, Arit Kumar Bishwas, Ayush Ashok Mishra, Ryan Armstrong

PDF

TL;DR

This paper comprehensively evaluates and compares various model compression techniques like quantization and pruning across different deep learning models, including large language models, highlighting their impacts on size, accuracy, and inference time.

Contribution

It provides an extensive analysis of compression methods applied to diverse models, bridging traditional deep learning and large language models with practical performance insights.

Findings

01

Quantization and pruning significantly reduce model size and inference time.

02

Compression techniques can impact model accuracy variably depending on the method and model type.

03

Challenges remain in balancing compression efficiency and maintaining high accuracy.

Abstract

Deep learning models have achieved tremendous success in most of the industries in recent years. The evolution of these models has also led to an increase in the model size and energy requirement, making it difficult to deploy in production on low compute devices. An increase in the number of connected devices around the world warrants compressed models that can be easily deployed at the local devices with low compute capacity and power accessibility. A wide range of solutions have been proposed by different researchers to reduce the size and complexity of such models, prominent among them are, Weight Quantization, Parameter Pruning, Network Pruning, low-rank representation, weights sharing, neural architecture search, knowledge distillation etc. In this research work, we investigate the performance impacts on various trained deep learning models, compressed using quantization and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · Knowledge Distillation