Tensor Product Attention Is All You Need

Yifan Zhang; Yifeng Liu; Huizhuo Yuan; Zhen Qin; Yang Yuan; Quanquan Gu; Andrew Chi-Chih Yao

arXiv:2501.06425·cs.CL·January 13, 2026

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan, Zhen Qin, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao

PDF

1 Repo 2 Models 1 Video

TL;DR

This paper introduces Tensor Product Attention (TPA), a memory-efficient attention mechanism using tensor decompositions, enabling longer sequence processing in language models without sacrificing performance.

Contribution

The paper proposes TPA, a novel attention method that reduces memory overhead and integrates with rotary embeddings, leading to a new architecture T6 that outperforms or matches existing models.

Findings

01

T6 surpasses standard Transformer baselines in language modeling tasks.

02

TPA significantly reduces KV cache size, enabling longer sequence processing.

03

T6 maintains competitive performance while improving memory and computational efficiency.

Abstract

Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we propose Tensor Product Attention (TPA), a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, substantially shrinking the KV cache size at inference time. By factorizing these representations into contextual low-rank components and seamlessly integrating with Rotary Position Embedding (RoPE), TPA achieves improved model quality alongside memory efficiency. Based on TPA, we introduce the Tensor ProducT ATTenTion Transformer (T6), a new model architecture for sequence modeling. Through extensive empirical evaluation on language modeling tasks, we demonstrate that T6 surpasses or matches the performance of standard Transformer baselines including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tensorgi/t6
pytorchOfficial

Models

Videos

Tensor Product Attention Is All You Need· slideslive

Taxonomy

MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer