Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Arnav Chavan, Raghav Magazine, Shubham Kushwaha, M\'erouane Debbah and, Deepak Gupta

TL;DR
This paper surveys recent methods for compressing and optimizing large language models to reduce computational and memory demands, supported by empirical evaluations on LLaMA models and discussions on future research directions.
Contribution
It provides a comprehensive overview of current model compression and system optimization techniques for LLMs, including empirical evaluation and practical insights.
Findings
Compression techniques significantly improve inference efficiency
Empirical results demonstrate effectiveness on LLaMA(/2)-7B models
Identifies current limitations and future research directions
Abstract
Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluate various compression techniques, providing practical insights for efficient LLM deployment in a unified setting. The empirical analysis on LLaMA(/2)-7B highlights the effectiveness of these methods. Drawing from survey insights, we identify current limitations and discuss potential future directions to improve LLM inference efficiency. We release the codebase to reproduce the results presented in this paper at https://github.com/nyunAI/Faster-LLM-Survey
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Cloud Computing and Resource Management · Digital Rights Management and Security
