TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train Decomposition
Mingxue Xu, Yao Lei Xu, Danilo P. Mandic

TL;DR
TensorGPT introduces a training-free compression method for large language models using Tensor-Train Decomposition, significantly reducing model size while maintaining performance, enabling deployment on low-end devices.
Contribution
The paper presents a novel tensor decomposition-based compression technique for LLMs that requires no additional training and effectively reduces embedding layer parameters.
Findings
Achieves up to 65.64x compression of embedding layers.
Reduces total model parameters by approximately 46.89%.
Maintains comparable language task performance after compression.
Abstract
High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, this high dimensionality also introduces considerable model parameters and prohibitively high model storage and memory requirements, which is particularly unaffordable for low-end devices. Targeting no extra training data and insufficient computation cases, we propose a training-free model compression approach based on the Tensor-Train Decomposition (TTD), whereby each pre-trained token embedding is converted into a lower-dimensional Matrix Product State (MPS). We then comprehensively investigate the low-rank structures extracted by this approach, in terms of the compression ratio, the language task performance, and latency on a typical low-end device (i.e. Raspberry Pi). Taking GPT family…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Residual Connection · Softmax · Dense Connections · Dropout · Layer Normalization
