Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes

Mohammadsajad Alipour; Mohammad Mohammadi Amiri

arXiv:2511.02681·cs.CL·November 5, 2025

Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes

Mohammadsajad Alipour, Mohammad Mohammadi Amiri

PDF

Open Access

TL;DR

This paper introduces an efficient method for storing fine-tuned large language models by combining low-rank approximation and sparsification, leading to better storage and accuracy trade-offs.

Contribution

We propose optimal singular damage, a novel approach that selectively sparsifies low-rank approximations to improve storage efficiency and model performance.

Findings

01

Outperforms standard low-rank methods in storage and accuracy

02

Sparsified low-rank approximations retain critical model components

03

Significant storage savings with maintained model expressivity

Abstract

Large language models (LLMs) are increasingly prevalent across diverse applications. However, their enormous size limits storage and processing capabilities to a few well-resourced stakeholders. As a result, most applications rely on pre-trained LLMs, fine-tuned for specific tasks. However, even storing the fine-tuned versions of these models remains a significant challenge due to the wide range of tasks they address. Recently, studies show that fine-tuning these models primarily affects a small fraction of parameters, highlighting the need for more efficient storage of fine-tuned models. This paper focuses on efficient storage of parameter updates in pre-trained models after fine-tuning. To address this challenge, we leverage the observation that fine-tuning updates are both low-rank and sparse, which can be utilized for storage efficiency. However, using only low-rank approximation or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Healthcare and Education