TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs
Yuxuan Gu, Wuyang Zhou, Giorgos Iacovides, Danilo Mandic

TL;DR
TensorLLM introduces a novel tensorisation and Tucker decomposition method to compress and denoise Multi-head Attention in LLMs, significantly enhancing reasoning abilities without extra training.
Contribution
The paper presents a new tensorisation framework for MHA weights that enables high-dimensional denoising and compression, improving LLM reasoning performance.
Findings
Achieves up to 250x compression of MHA weights.
Enhances reasoning capabilities across multiple benchmarks.
Can be combined with existing denoising techniques for further gains.
Abstract
The reasoning abilities of Large Language Models (LLMs) can be improved by structurally denoising their weights, yet existing techniques primarily focus on denoising the feed-forward network (FFN) of the transformer block, and can not efficiently utilise the Multi-head Attention (MHA) block, which is the core of transformer architectures. To address this issue, we propose a novel intuitive framework that, at its very core, performs MHA compression through a multi-head tensorisation process and the Tucker decomposition. This enables both higher-dimensional structured denoising and compression of the MHA weights, by enforcing a shared higher-dimensional subspace across the weights of the multiple attention heads. We demonstrate that this approach consistently enhances the reasoning capabilities of LLMs across multiple benchmark datasets, and for both encoder-only and decoder-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
MethodsSoftmax · Linear Layer · Attention Is All You Need · Multi-Head Attention · TuckER · Focus
