Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models
Ryan Solgi, Kai Zhen, Rupak Vignesh Swaminathan, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, Zheng Zhang

TL;DR
This paper introduces Saten, a novel sparse augmented tensor network method that significantly improves post-training compression and accuracy of large language models, enabling efficient deployment on resource-limited devices.
Contribution
Saten is a new framework that enhances tensorized LLMs with sparsity, allowing full model compression and improved performance without access to pretraining data.
Findings
Saten achieves state-of-the-art compression and accuracy.
Saten enhances tensorized language models during fine-tuning.
Experimental results validate the effectiveness of Saten.
Abstract
The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained large language models (LLMs) for downstream tasks (post-training) remains challenging due to the high-rank nature of pre-trained LLMs and the lack of access to pretraining data. In this study, we investigate low-rank tensorized LLMs during fine-tuning and propose sparse augmented tensor networks (Saten) to enhance their performance. The proposed Saten framework enables full model compression. Experimental results demonstrate that Saten enhances both accuracy and compression efficiency in tensorized language models, achieving state-of-the-art performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Tensor decomposition and applications · Advanced Neural Network Applications
