Structured Multidimensional Representation Learning for Large Language Models

Alaa El Ichi; Khalide Jbilou; Mohamed El Guide; Franck Dufrenois

arXiv:2603.05727·cs.CL·March 9, 2026

Structured Multidimensional Representation Learning for Large Language Models

Alaa El Ichi, Khalide Jbilou, Mohamed El Guide, Franck Dufrenois

PDF

Open Access

TL;DR

This paper introduces a spectral tensor factorization for Transformer embeddings, enabling significant parameter reduction while maintaining or improving performance on NLP tasks.

Contribution

It proposes a spectral tensor decomposition method for Transformers that reduces parameters and introduces an inductive bias, improving efficiency and generalization.

Findings

01

Up to 75% encoder parameter reduction with maintained accuracy.

02

Spectral tensorization enables efficient parallel Transformer sub-models.

03

Method remains fully differentiable and compatible with existing training pipelines.

Abstract

Transformer architectures achieve state-of-the-art performance across a wide range of pattern recognition and natural language processing tasks, but their scaling is accompanied by substantial parameter growth and redundancy in the embedding dimension. In this work, we introduce a structured spectral factorization of the embedding space based on the L-product for third-order tensors. By reshaping token representations into spectral tensor slices and performing attention and feed-forward operations in the transform domain, we obtain a Tensor Transformer architecture that decomposes the encoder into p independent spectral sub-transformers while preserving standard Transformer semantics. We prove that the proposed L-Transformer is spectrally equivalent to p parallel Transformers operating on reduceddimensional embeddings, which yields approximately 1/p reduction (up to lower-order terms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis