LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing

Ruijie Zhang; Ziyue Liu; Zhengyang Wang; Zheng Zhang

arXiv:2505.21732·cs.LG·October 28, 2025

LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing

Ruijie Zhang, Ziyue Liu, Zhengyang Wang, Zheng Zhang

PDF

Open Access

TL;DR

LaX is a simple module that enhances low-rank models of foundation models, enabling them to perform on par with full-rank models while using significantly fewer parameters.

Contribution

Introduces Latent Crossing (LaX), a plug-and-play module that improves low-rank model capacity by facilitating information flow across subspaces.

Findings

01

LaX boosts low-rank model performance to match or exceed full-rank baselines.

02

LaX reduces parameter count by 2-3 times compared to full models.

03

LaX improves fine-tuned LLaMA performance on reasoning tasks.

Abstract

Training foundation models such as ViTs and LLMs requires tremendous computing cost. Low-rank matrix or tensor factorization offers a parameter-efficient alternative, but often downgrades performance due to the restricted parameter space. In this work, we introduce {\textbf{Latent Crossing (LaX)}} -- a simple yet effective plug-and-play module that enhances the capacity of low-rank models by enabling information flow across low-rank subspaces. We extensively validate the benefits of LaX on pre-training tasks with ViT-Base/Large and LLaMA-like models ranging from 60M to 1B parameters. LaX boosts low-rank model performance to match or exceed the full-rank baselines while using 2-3\(\times\) fewer parameters. When equipped with low-rank adapters (i.e., LoRA) for fine-tuning LLaMA-7/13B, LaX consistently improves performance on arithmetic and common sense reasoning tasks with negligible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications