TensorGPT: Efficient Compression of Large Language Models based on   Tensor-Train Decomposition

Mingxue Xu; Yao Lei Xu; Danilo P. Mandic

arXiv:2307.00526·cs.CL·October 7, 2024·6 cites

TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train Decomposition

Mingxue Xu, Yao Lei Xu, Danilo P. Mandic

PDF

Open Access

TL;DR

TensorGPT introduces a training-free compression method for large language models using Tensor-Train Decomposition, significantly reducing model size while maintaining performance, enabling deployment on low-end devices.

Contribution

The paper presents a novel tensor decomposition-based compression technique for LLMs that requires no additional training and effectively reduces embedding layer parameters.

Findings

01

Achieves up to 65.64x compression of embedding layers.

02

Reduces total model parameters by approximately 46.89%.

03

Maintains comparable language task performance after compression.

Abstract

High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, this high dimensionality also introduces considerable model parameters and prohibitively high model storage and memory requirements, which is particularly unaffordable for low-end devices. Targeting no extra training data and insufficient computation cases, we propose a training-free model compression approach based on the Tensor-Train Decomposition (TTD), whereby each pre-trained token embedding is converted into a lower-dimensional Matrix Product State (MPS). We then comprehensively investigate the low-rank structures extracted by this approach, in terms of the compression ratio, the language task performance, and latency on a typical low-end device (i.e. Raspberry Pi). Taking GPT family…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Residual Connection · Softmax · Dense Connections · Dropout · Layer Normalization