Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes
Mohammadsajad Alipour, Mohammad Mohammadi Amiri

TL;DR
This paper introduces an efficient method for storing fine-tuned large language models by combining low-rank approximation and sparsification, leading to better storage and accuracy trade-offs.
Contribution
We propose optimal singular damage, a novel approach that selectively sparsifies low-rank approximations to improve storage efficiency and model performance.
Findings
Outperforms standard low-rank methods in storage and accuracy
Sparsified low-rank approximations retain critical model components
Significant storage savings with maintained model expressivity
Abstract
Large language models (LLMs) are increasingly prevalent across diverse applications. However, their enormous size limits storage and processing capabilities to a few well-resourced stakeholders. As a result, most applications rely on pre-trained LLMs, fine-tuned for specific tasks. However, even storing the fine-tuned versions of these models remains a significant challenge due to the wide range of tasks they address. Recently, studies show that fine-tuning these models primarily affects a small fraction of parameters, highlighting the need for more efficient storage of fine-tuned models. This paper focuses on efficient storage of parameter updates in pre-trained models after fine-tuning. To address this challenge, we leverage the observation that fine-tuning updates are both low-rank and sparse, which can be utilized for storage efficiency. However, using only low-rank approximation or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Healthcare and Education
