Low-Rank Interconnected Adaptation across Layers

Yibo Zhong; Jinman Zhao; Yao Zhou

arXiv:2407.09946·cs.CV·May 30, 2025

Low-Rank Interconnected Adaptation across Layers

Yibo Zhong, Jinman Zhao, Yao Zhou

PDF

Open Access 1 Repo 3 Reviews

TL;DR

Lily introduces a low-rank interconnected adaptation framework that enhances parameter efficiency and expressiveness in fine-tuning large models by sharing components across layers and using data-dependent routing.

Contribution

It proposes a novel PEFT method with interconnected low-rank adapters, improving adaptation performance and efficiency over traditional LoRA.

Findings

01

Lily outperforms existing methods across various tasks and models.

02

The interconnected structure allows higher-rank updates with fewer parameters.

03

Data-dependent routing enhances cross-domain adaptability.

Abstract

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning (PEFT) method that learns weight updates $Δ W = A B$ for pretrained weights $W$ through low-rank adapters $A$ and $B$ . While LoRA ensures hardware efficiency, its low-rank weight updates limit adaptation performance. In this paper, we propose low-rank interconnected adaptation across layers (Lily), a novel PEFT method that introduces an interconnected framework with locally shared $A$ and globally shared $B$ experts. This structure eliminates redundant per-layer $A B$ pairs, enabling higher-rank $Δ W$ with equal or fewer parameters. To enhance expressiveness, we use data-dependent routers to determine $A$ - $B$ interconnections, preventing $B$ experts from converging to the same behavior and improving representational power across domains. Experiments across modalities, architectures, and model sizes…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

- Overall, this paper is well-written. The authors provide a comprehensive description of the proposed method, and the appendix is detailed. - The authors apply the proposed method to various models across different modalities, demonstrating its effectiveness. They include both quantitative and qualitative evaluations. - The proposed method is inspired by a thorough analysis of the existing LoRA approach.

Weaknesses

- Since the authors derived the forward path of the proposed method in Section 3.2, would it be possible to analyze the rank gain? A theoretical analysis of the improvement assuming a single layer could be helpful. - It would be beneficial if the authors could further evaluate the improvement qualitatively, as the current gains are not particularly substantial. For example, in the RoBERTa-base experiments, the proposed Lily method only outperforms others on 2 out of all tasks. Additionally, Ada

Reviewer 02Rating 5Confidence 2

Strengths

1. The proposed methodology that considers cross-layer interactions is new to me. 2. The empirical results show the method's potential in various domains.

Weaknesses

The major flaws are about the presentation. 1. Some derivation steps could be merged and only the important steps can be given. The blanks could be reserved for discussing the intuitions and analysis. 2. The fonts in the figures are too small. Normally they should be larger than the smallest font in the main body.

Reviewer 03Rating 3Confidence 4

Strengths

1. The motivation is reasonable as learning multiple shared low-rank matrices can represent various low-rank subspaces, and their combinations have the potential to capture information across a higher-rank space. 1. Comprehensive experiments demonstrate the effcacy and adaptability of Lily, achieving state-of-the-art results across a diverse set of tasks in both language and vision domains.

Weaknesses

1. The presentation of the methodology has significant room for improvement. 1. The description of the methodologies (lines 204-210) and the framework depicted in Figure 1 within the main text differ from the final implementation discussed in Appendix A.1.1 (lines 770-778). In the main text, low-dimensional projectors (LPs) are described as being tied to each layer of a module, while high-dimensional projectors (HPs) are shared across the model. However, Appendix A.1.1 indicates that Lily

Code & Models

Repositories

yibozhong/lily
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsThin-Film Transistor Technologies

MethodsAdapter · Mixture of Experts