Mixture of Latent Experts Using Tensor Products
Zhan Su, Fengran Mo, Prayag Tiwari, Benyou Wang, Jian-Yun Nie, Jakob, Grue Simonsen

TL;DR
This paper introduces a novel modular language model, TensorPoly, utilizing tensor product-based reparameterization and routing functions, which improves multi-task learning efficiency and performance by mitigating negative transfer.
Contribution
The paper proposes TensorPoly, a modular language model with tensor product reparameterization and innovative routing functions, enhancing multi-task learning and parameter efficiency.
Findings
Modular LMs outperform dense models in multi-task benchmarks.
TensorPoly-I achieves higher parameter efficiency and better performance.
The approach mitigates negative transfer in multi-task learning.
Abstract
In multi-task learning, the conventional approach involves training a model on multiple tasks simultaneously. However, the training signals from different tasks can interfere with one another, potentially leading to \textit{negative transfer}. To mitigate this, we investigate if modular language models can facilitate positive transfer and systematic generalization. Specifically, we propose a novel modular language model (\texttt{TensorPoly}), that balances parameter efficiency with nuanced routing methods. For \textit{modules}, we reparameterize Low-Rank Adaptation (\texttt{LoRA}) by employing an entangled tensor through the use of tensor product operations and name the resulting approach \texttt{TLoRA}. For \textit{routing function}, we tailor two innovative routing functions according to the granularity: \texttt{TensorPoly-I} which directs to each rank within the entangled tensor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Forecasting Techniques and Applications · Anomaly Detection Techniques and Applications
