Structured Multidimensional Representation Learning for Large Language Models
Alaa El Ichi, Khalide Jbilou, Mohamed El Guide, Franck Dufrenois

TL;DR
This paper introduces a spectral tensor factorization for Transformer embeddings, enabling significant parameter reduction while maintaining or improving performance on NLP tasks.
Contribution
It proposes a spectral tensor decomposition method for Transformers that reduces parameters and introduces an inductive bias, improving efficiency and generalization.
Findings
Up to 75% encoder parameter reduction with maintained accuracy.
Spectral tensorization enables efficient parallel Transformer sub-models.
Method remains fully differentiable and compatible with existing training pipelines.
Abstract
Transformer architectures achieve state-of-the-art performance across a wide range of pattern recognition and natural language processing tasks, but their scaling is accompanied by substantial parameter growth and redundancy in the embedding dimension. In this work, we introduce a structured spectral factorization of the embedding space based on the L-product for third-order tensors. By reshaping token representations into spectral tensor slices and performing attention and feed-forward operations in the transform domain, we obtain a Tensor Transformer architecture that decomposes the encoder into p independent spectral sub-transformers while preserving standard Transformer semantics. We prove that the proposed L-Transformer is spectrally equivalent to p parallel Transformers operating on reduceddimensional embeddings, which yields approximately 1/p reduction (up to lower-order terms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis
