TAS-LoRA: Transformer Architecture Search with Mixture-of-LoRA Experts
Jeimin Jeon, Hyunju Lee, Bumsub Ham

TL;DR
TAS-LoRA introduces a parameter-efficient method using Mixture-of-LoRAExperts to improve transformer architecture search by enabling subnet-specific features and reducing feature collapse.
Contribution
It proposes TAS-LoRA, a novel approach combining LoRA and a dynamic router to enhance subnet learning in transformer architecture search.
Findings
TAS-LoRA outperforms existing TAS methods on ImageNet.
It effectively mitigates feature collapse in supernet training.
Demonstrates strong transfer learning performance across multiple benchmarks.
Abstract
Transformer architecture search (TAS) discovers optimal vision transformer (ViT) architectures automatically, reducing human effort to manually design ViTs. However, existing TAS methods suffer from the feature collapse problem, where subnets within a supernet fail to learn subnet-specific features, mainly due to the shared weights in a supernet, limiting the performance of individual subnets. To address this, we propose TAS-LoRA, a novel method that introduces parameter-efficient low-rank adaptation (LoRA) to enable subnet-specific feature learning, while maintaining computational efficiency. TAS-LoRA incorporates a Mixture-of-LoRAExperts (MoLE) strategy, where a lightweight router dynamically assigns LoRA experts based on subnet architectures, and introduces a group-wise router initialization technique to encourage diverse feature learning across experts early in training. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
