Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation

Zikai Zhang; Rui Hu; Jiahao Xu

arXiv:2602.16936·cs.DC·February 20, 2026

Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation

Zikai Zhang, Rui Hu, Jiahao Xu

PDF

Open Access 3 Reviews

TL;DR

Fed-PLoRA is a new federated fine-tuning framework for large language models that effectively handles client heterogeneity by using parallel one-rank adaptation modules, improving accuracy and efficiency.

Contribution

Introduces Fed-PLoRA, a lightweight federated fine-tuning method with parallel one-rank adaptation and a novel folding strategy for heterogeneous client resources.

Findings

01

Outperforms existing methods in accuracy.

02

Enhances efficiency in federated fine-tuning.

03

Addresses heterogeneity challenges effectively.

Abstract

Large Language Models (LLMs) have demonstrated remarkable effectiveness in adapting to downstream tasks through fine-tuning. Federated Learning (FL) extends this capability by enabling collaborative fine-tuning across distributed clients using Low-Rank Adaptation (LoRA), while preserving data privacy by avoiding raw data sharing. However, practical deployments face challenges when clients have heterogeneous resources and thus adopt different LoRA ranks, leading to substantial initialization and aggregation noise that undermines performance. To address these challenges, we propose Fed-PLoRA, a novel lightweight heterogeneous federated fine-tuning (FFT) framework. Fed-PLoRA introduces Parallel One-Rank Adaptation (PLoRA), a new LoRA variant that replaces the classic multi-rank LoRA module with multiple parallel one-rank modules, and a novel Select-N-Fold strategy that folds untrained…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 1

Strengths

1. This paper precisely defines initialization and aggregation noise in heterogeneous LoRA settings and shows Fed-PLoRA removes the former and reduces the latter. 2. PLoRA’s parallel rank-1 decomposition is mathematically equivalent to standard LoRA yet naturally supports heterogeneity; paired with Select-N-Fold, it guarantees zero initialization noise while curbing aggregation noise. 3. Strong and robust empirical results. Fed-PLoRA consistently outperforms FLoRA/FlexLoRA/HETLoRA across tasks

Weaknesses

1. The method broadcasts all R parallel rank-1 modules to every client and asks clients to keep folded modules, downlink traffic and on-device storage could become non-trivial in weak-network or mobile scenarios. 2. The paper’s empirical validation relies on relatively small or outdated and non-unified base models, which limits generalizability; it would be stronger to standardize on modern backbones like Qwen3 and Llama 3.2 across multiple sizes. 3. Its benchmark suite skews toward easier tasks

Reviewer 02Rating 6Confidence 4

Strengths

* The paper is well-written and the authors did a good job explaining the existing problems. * Through various settings and different empirical results the authors show the merits of their algorithms. * The authors did a proper ablation study, explaining the importance of each component.

Weaknesses

One important aspect of the paper is the Downlink Communication cost. The "Select-N-Fold" strategy has one clear limitation that is understated: downlink communication cost.

Reviewer 03Rating 6Confidence 5

Strengths

- **Clear and Compelling Motivation:** The paper is exceptionally well-motivated. It formalizes the *specific failure modes* of prior art: initialization noise and aggregation noise. The entire paper is a clear and focused effort to solve these two problems. - **Strong Theoretical Analysis:** The noise analysis theoretically proves that the proposed Fed-PLORA framework eliminates initialization noise and provides a powerful and fundamental justification for the method's design. - **Simple and Ef

Weaknesses

- Adding pseudocode for the algorithm would improve clarity. - It appears that PLoRA requires downloading the entire global LoRA model. In contrast, other methods—if rank is publicly available—can use much smaller downloads. Although Section 4.2 addresses this, the R − ri downlink cost could be quite high when R is large and ri is small, potentially causing synchronization issues.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning · Big Data and Digital Economy