TAS-LoRA: Transformer Architecture Search with Mixture-of-LoRA Experts

Jeimin Jeon; Hyunju Lee; Bumsub Ham

arXiv:2605.07256·cs.CV·May 11, 2026

TAS-LoRA: Transformer Architecture Search with Mixture-of-LoRA Experts

Jeimin Jeon, Hyunju Lee, Bumsub Ham

PDF

TL;DR

TAS-LoRA introduces a parameter-efficient method using Mixture-of-LoRAExperts to improve transformer architecture search by enabling subnet-specific features and reducing feature collapse.

Contribution

It proposes TAS-LoRA, a novel approach combining LoRA and a dynamic router to enhance subnet learning in transformer architecture search.

Findings

01

TAS-LoRA outperforms existing TAS methods on ImageNet.

02

It effectively mitigates feature collapse in supernet training.

03

Demonstrates strong transfer learning performance across multiple benchmarks.

Abstract

Transformer architecture search (TAS) discovers optimal vision transformer (ViT) architectures automatically, reducing human effort to manually design ViTs. However, existing TAS methods suffer from the feature collapse problem, where subnets within a supernet fail to learn subnet-specific features, mainly due to the shared weights in a supernet, limiting the performance of individual subnets. To address this, we propose TAS-LoRA, a novel method that introduces parameter-efficient low-rank adaptation (LoRA) to enable subnet-specific feature learning, while maintaining computational efficiency. TAS-LoRA incorporates a Mixture-of-LoRAExperts (MoLE) strategy, where a lightweight router dynamically assigns LoRA experts based on subnet architectures, and introduces a group-wise router initialization technique to encourage diverse feature learning across experts early in training. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.