MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning
Javier Lopez-Piqueres, Pranav Deshpande, Archan Ray, Mattia J. Villani, Marco Pistoia, Niraj Kumar

TL;DR
MetaTT introduces a tensor train adapter framework for efficient fine-tuning of transformers, achieving competitive performance with fewer parameters and enabling adaptive optimization strategies.
Contribution
It proposes a novel tensor train-based adapter for transformers that reduces parameter count and improves multi-task learning efficiency.
Findings
MetaTT achieves competitive accuracy with fewer parameters.
It outperforms or matches state-of-the-art methods on language modeling benchmarks.
The rank adaptive optimizer improves training efficiency and model performance.
Abstract
We present MetaTT, a Tensor Train (TT) adapter framework for fine-tuning of pre-trained transformers. MetaTT enables flexible and parameter-efficient model adaptation by using a single shared TT to factorize transformer sub-modules. This factorization indexes key structural dimensions, including layer and matrix type, and can optionally incorporate heads and tasks. This design allows MetaTT's parameter count to scale with the sum, rather than the product, of the modes, resulting in a substantially more compact adapter. Our benchmarks compare MetaTT with LoRA along with recent state-of-the-art matrix and tensor decomposition based fine-tuning methods. We observe that when tested on single-task standard language modeling benchmarks, MetaTT achieves competitive parameter efficiency to accuracy tradeoff. We further demonstrate that MetaTT performs competitively when compared to…
Peer Reviews
Decision·Submitted to ICLR 2026
This is an elegant unified approach to compressing PEFT matrices to reduce parameter counts. It also elegantly extends to multi-task PEFT, allowing shared structure across tasks. Empirical evaluations are done on a good variety of tasks, using three versions of the model, each an extension of the previous model. Results are generally good or comparable to previous methods, but with greatly reduced parameter counts.
The novelty is not high. There has been a lot of work on PEFT already, and this work does not add much conceptual or theoretical novelty. The contribution is in identifying a general-purpose mathematical framework which addresses PEFT in a consistent way, rather than a collection of ad-hoc methods. The empirical results do not demonstrate any breakthroughs with respect to previous work.
The paper is well written, easy to follow, the contribution and references to prior work are clear, the approach is sound and experiments not only include the standard benchmarks but illustrate particularities of their proposed methods. Both TT parameterizations (e.g. LoTR) and fifth order tensor adapter models that use layers input output dimensions and heads (e.g. LoRTA) have been proposed, but not their conjunction. Secondly, treating tasks as an additional dimension is, to the best of my k
I think that the rank adaptive optimization scheme is a strong contribution. The experiments that showcase its benefits are centered in the standard NLU setting with roberta, but I think it would be useful to extend the empirical analysis to (at least one, ideally all) of other benchmarks/tasks/models in order to further substantiate the empirical gains from this scheme in settings that are regarded as more challenging.
1. The proposed use of a single shared TT for compressing all transformer layers and sub-modules is novel and shows promise in reducing the parameter count while maintaining competitive performance. 2. The rank-adaptive training inspired by DMRG is an interesting integration of techniques from quantum physics into machine learning, showcasing interdisciplinarity and potential for further exploration. 3. The paper compares MetaTT with several state-of-the-art PEFT methods, including LoRA and LoTR
1. Despite the novelty of the approach, the reported performance improvements are marginal or absent in most cases compared to simpler baselines like LoRA, especially given the significant computational complexity added by TT decomposition and rank-adaptive training. 2. While the DMRG-inspired optimizer is presented as a key contribution, its practical benefits over standard optimizers like AdamW are not convincingly demonstrated. The rank-adaptive approach introduces additional training complex
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Quantum many-body systems · Machine Learning in Materials Science
MethodsAdapter · Adam
