One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Arnav Chavan, Zhuang Liu, Deepak Gupta, Eric Xing and, Zhiqiang Shen

TL;DR
GLoRA is a generalized, flexible, and efficient fine-tuning method that enhances pre-trained models across various tasks without extra inference costs, outperforming previous approaches in accuracy and resource usage.
Contribution
It introduces GLoRA, a novel generalized prompt module with a layer-wise structure search, improving parameter-efficient fine-tuning across diverse tasks and domains.
Findings
Outperforms previous methods in vision and language benchmarks.
Achieves higher accuracy with fewer parameters and computations.
No extra inference cost due to structural re-parameterization.
Abstract
We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptation by employing a scalable, modular, layer-wise structure search that learns individual adapter of each layer. Originating from a unified mathematical formulation, GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities, as it adapts to new tasks through not only weights but also additional dimensions like activations. Comprehensive experiments demonstrate that GLoRA outperforms all previous methods in natural, specialized, and structured vision benchmarks,…
Peer Reviews
Decision·Submitted to ICLR 2024
This paper is clearly presented and well organized. The authors also provide a detailed discussion of related works and variants. GLoRA referneces the inspirations from RepVGG, which introduces fusable parameters during training to improve model capacity. This can generally bring improvements without extra inference cost as shown in the experiments. GLoRA offers a unifed framework that includes multiple fine-tuning paradigms and provides a more generalized prompt mdule design per layer. The f
The authors claim that GLoRA can be "seamlessly integrate into the base network", but it seems such design is for linear layer only. But there are many other type of operators like conv / normalization layers. How can GLoRA be combined with those layers? The evolutional search (Sec 2.4) is crucial for GLoRA as it decides which layer and scheme to use during fine-tuning. However, the details of the search and final chosen paradigms are not clearly discussed in the main paper. As the abstract e
1. GLoRA has a re-parameterization design. It is more similar to LORA than Adapter. It makes GLoRA more flexible since it does not need to change the structure of the original backbone. And it incurs no extra inference cost. 2. GLoRA integrates multiple methods and can perform similar effects as most of the existing PEFT modules. 3. The authors conduct multiple experiments to demonstrate the generality and effectiveness of GLoRA.
1. It seems that GLoRA is not general, since it has an evolutionary search procedure to obtain the suitable components. The idea is similar to Neural Prompt Search [1]. GLoRA is not a fixed design as existing modules, which might limit its practicality. 2. GLoRA has a large search space, which might yield huge time costs. However, the authors have not mentioned the actual training time and memory cost of GLoRA, which is very important for PEFT modules. 3. The authors introduce multiple PEFT modu
GLoRA effectively consolidates previous parameter-efficient fine-tuning methods within Equation 10. Importantly, all adjustable support tensors are linear, which makes structural re-parameterization readily accessible. The paper highlights GLoRA's commendable capacity to generalize across diverse tasks, an invaluable quality in machine learning and a frequently challenging facet of model development.
Structural re-parameterization requires storing the full set of weights (including bias) for every individual downstream task. This means that as the number of these tasks increases, the storage needs can become prohibitively large. Although this approach might improve inference performance, the substantial storage overhead can be a major impediment for real-world deployment, especially when multiple tuned-models are needed to handle different downstream tasks. The clarity of the paper is occas
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM
MethodsAdapter
