FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Chuiyang Meng, Ming Tang, Vincent W.S. Wong

TL;DR
FLoRG introduces a federated fine-tuning framework that uses low-rank Gram matrix aggregation and Procrustes alignment to improve efficiency and consistency in large language model adaptation across distributed clients.
Contribution
It proposes a novel federated fine-tuning method that reduces communication costs and addresses decomposition drift using Gram matrix aggregation and Procrustes alignment.
Findings
Outperforms five baseline schemes in accuracy.
Reduces communication overhead by up to 2041×.
Provides theoretical convergence guarantees.
Abstract
Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently. Federated learning (FL) further facilitates this process by enabling collaborative fine-tuning across distributed clients without sharing private data. However, the use of two separate low-rank matrices in LoRA for federated fine-tuning introduces two types of challenges. First, aggregation error can arise from separately aggregating the two low-rank matrices. Second, even if the server aggregates the product of two low-rank matrices, it needs to decompose the aggregated matrix back into low-rank matrices. Since the decomposition is not unique, it can lead to decomposition drift. To tackle the aforementioned challenges, we propose federated low-rank Gram-matrix aggregation (FLoRG), a federated fine-tuning framework which employs a…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper presents a well-structured theoretical analysis with formal proofs, offering strong theoretical soundness and clear convergence guarantees. 2. It explores an interesting and under-studied problem—eliminating aggregation bias and decomposition drift in federated LoRA fine-tuning—introducing new insights into parameter-efficient federated learning. 3. The paper is clearly written, correctly annotated, and provides a thorough description of the proposed FLoRG framework, making it easy
1. The paper does not address the partial client participation scenario, which is common in practical federated learning settings. Evaluating FLoRG under varying client availability would strengthen its applicability. 2. The experiments are conducted only on OPT-125M and RoBERTa-large, which are relatively dated compared to current state-of-the-art LLMs such as LLaMA-3 and Qwen-2.5. Using more recent backbones would better demonstrate the scalability and relevance of FLoRG. 3. The paper reports
* The paper is well-written. The authors did a good job categorising and explaining the existing problems. * The algorithm performs better than the mentioned baselines. * The authors provide a convergence analysis for FLoRG.
* Clarity on Communication Saving. I would appreciate it if the authors explained the communication saving part of their claim. Did they measure the communication compared to full matrix communication or other Federated LoRA methods? * Server-Side Computational Overhead: The paper does not discuss the server-side computational cost, which appears to be substantial, especially doing matrix decomposition and solving optimization. * The baselines are considerably basic. By just checking recent A
1. Bias-free aggregation with one matrix. 2. Convergence bound tightens when alignment is used; ablations show sizeable accuracy gains from Procrustes; headline comms savings to target accuracy.
1. The approach relies on semi-orthogonal L,R that never update; performance is sensitive to their initialization. 2. Each round per layer requires eigendecomposition of Q and an SVD for Procrustes; scalability or latency with many layers or clients isn’t benchmarked.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Mobile Crowdsensing and Crowdsourcing
