Low-Rank Interconnected Adaptation across Layers
Yibo Zhong, Jinman Zhao, Yao Zhou

TL;DR
Lily introduces a low-rank interconnected adaptation framework that enhances parameter efficiency and expressiveness in fine-tuning large models by sharing components across layers and using data-dependent routing.
Contribution
It proposes a novel PEFT method with interconnected low-rank adapters, improving adaptation performance and efficiency over traditional LoRA.
Findings
Lily outperforms existing methods across various tasks and models.
The interconnected structure allows higher-rank updates with fewer parameters.
Data-dependent routing enhances cross-domain adaptability.
Abstract
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning (PEFT) method that learns weight updates for pretrained weights through low-rank adapters and . While LoRA ensures hardware efficiency, its low-rank weight updates limit adaptation performance. In this paper, we propose low-rank interconnected adaptation across layers (Lily), a novel PEFT method that introduces an interconnected framework with locally shared and globally shared experts. This structure eliminates redundant per-layer pairs, enabling higher-rank with equal or fewer parameters. To enhance expressiveness, we use data-dependent routers to determine - interconnections, preventing experts from converging to the same behavior and improving representational power across domains. Experiments across modalities, architectures, and model sizes…
Peer Reviews
Decision·Submitted to ICLR 2025
- Overall, this paper is well-written. The authors provide a comprehensive description of the proposed method, and the appendix is detailed. - The authors apply the proposed method to various models across different modalities, demonstrating its effectiveness. They include both quantitative and qualitative evaluations. - The proposed method is inspired by a thorough analysis of the existing LoRA approach.
- Since the authors derived the forward path of the proposed method in Section 3.2, would it be possible to analyze the rank gain? A theoretical analysis of the improvement assuming a single layer could be helpful. - It would be beneficial if the authors could further evaluate the improvement qualitatively, as the current gains are not particularly substantial. For example, in the RoBERTa-base experiments, the proposed Lily method only outperforms others on 2 out of all tasks. Additionally, Ada
1. The proposed methodology that considers cross-layer interactions is new to me. 2. The empirical results show the method's potential in various domains.
The major flaws are about the presentation. 1. Some derivation steps could be merged and only the important steps can be given. The blanks could be reserved for discussing the intuitions and analysis. 2. The fonts in the figures are too small. Normally they should be larger than the smallest font in the main body.
1. The motivation is reasonable as learning multiple shared low-rank matrices can represent various low-rank subspaces, and their combinations have the potential to capture information across a higher-rank space. 1. Comprehensive experiments demonstrate the effcacy and adaptability of Lily, achieving state-of-the-art results across a diverse set of tasks in both language and vision domains.
1. The presentation of the methodology has significant room for improvement. 1. The description of the methodologies (lines 204-210) and the framework depicted in Figure 1 within the main text differ from the final implementation discussed in Appendix A.1.1 (lines 770-778). In the main text, low-dimensional projectors (LPs) are described as being tied to each layer of a module, while high-dimensional projectors (HPs) are shared across the model. However, Appendix A.1.1 indicates that Lily
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsThin-Film Transistor Technologies
MethodsAdapter · Mixture of Experts
