MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure

Jiale Kang; Qingyu Yin

arXiv:2409.15371·cs.CL·December 15, 2025

MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure

Jiale Kang, Qingyu Yin

PDF

Open Access 3 Repos 1 Models 3 Reviews

TL;DR

MiSS introduces a shard-sharing structure for LoRA that improves convergence speed and balances performance, memory, and efficiency, supported by theoretical and empirical evidence.

Contribution

Proposes Matrix Shard Sharing (MiSS) and MiSS$^e$, novel methods that enhance LoRA's convergence and efficiency while maintaining performance.

Findings

01

Reduces optimization complexity without performance loss

02

Achieves better trade-offs among performance, memory, and efficiency

03

Occupies a favorable position on the Pareto frontier among PEFT methods

Abstract

Low-Rank Adaptation (LoRA) is a widely adopted technique for parameter-efficient fine-tuning, but its slow convergence has spurred the development of numerous variants. Nevertheless, existing methods often fail to improve performance, memory footprint, and computational efficiency simultaneously. To address this challenge, we revisit the causes of LoRA's slow convergence. Building on these insights, we propose Matrix Shard Sharing (MiSS), which updates shards of the original weight matrix using a single shared trainable matrix $D$ , initialized to zeros. To simultaneously ensure computational efficiency, low memory footprint, and scalable serving, we introduce MiSS $^{e}$ . Both theoretical analysis and empirical results demonstrate that our method reduces optimization complexity without compromising performance, thereby achieving a more favorable trade-off among performance,…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

The proposed method is remarkably simple and easy to implement, yet it demonstrates strong practical effectiveness. In several experimental settings, the method achieves notable improvements over LoRA. For instance, on the Mistral-7B model, MiSS outperforms LoRA by approximately 15%, highlighting its potential as a competitive and efficient alternative for parameter-efficient fine-tuning.

Weaknesses

Firstly, although the paper states that MiSS is motivated by theoretical analysis, the practical method itself is presented without clear theoretical justification or development. The coherence and clarity of the paper would be greatly improved by adding a dedicated subsection in Section 4 that discusses the theoretical motivations behind the architectural design, which appears to be a novel choice aimed at ensuring the low-rank condition. Secondly, since the paper seeks to propose a practical

Reviewer 02Rating 6Confidence 3

Strengths

1. Quality: MiSS effectively addresses LoRA's limitations by improving convergence without sacrificing efficiency, as supported by theoretical insights and Pareto analysis, making it a practical advancement in PEFT. 2. Soundness: The paper includes detailed comparisons with variants like PiSSA and LoRA-GA, covering multiple dimensions (performance, memory, compute), which strengthens the claims.

Weaknesses

1. Reliance on zero-initialized $D$ may limit adaptability in certain scenarios, potentially requiring further tuning. 2. Evaluations are primarily on language tasks; broader domains (e.g., vision or multimodal) are not explored. 3. The Pareto frontier mapping is insightful but could be more granular, e.g., with statistical significance tests.

Reviewer 03Rating 4Confidence 4

Strengths

The proposed method is very pragmatic and practical. The proposed method was examined in comparison with different LoRA variations in several benchmarks.

Weaknesses

Notation should be revised and redundancy in some terms should be fixed. The paper states that: Through theoretical analyses and empirical results, our method reduces optimization complexity while maintaining strong performance, striking a favorable balance between performance, memory, and efficiency. That is, one of the main claims is the theoretical analysis of the proposed methods. However, this is not well explored in the paper.

Code & Models

Repositories

Models

🤗
Seikaijyu/RWKV6-3B-Chn-UnlimitedRP-mini-chat
model· ♡ 19
♡ 19

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings