FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

Chuiyang Meng; Ming Tang; Vincent W.S. Wong

arXiv:2602.17095·cs.LG·March 9, 2026

FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

Chuiyang Meng, Ming Tang, Vincent W.S. Wong

PDF

Open Access 3 Reviews

TL;DR

FLoRG introduces a federated fine-tuning framework that uses low-rank Gram matrix aggregation and Procrustes alignment to improve efficiency and consistency in large language model adaptation across distributed clients.

Contribution

It proposes a novel federated fine-tuning method that reduces communication costs and addresses decomposition drift using Gram matrix aggregation and Procrustes alignment.

Findings

01

Outperforms five baseline schemes in accuracy.

02

Reduces communication overhead by up to 2041×.

03

Provides theoretical convergence guarantees.

Abstract

Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently. Federated learning (FL) further facilitates this process by enabling collaborative fine-tuning across distributed clients without sharing private data. However, the use of two separate low-rank matrices in LoRA for federated fine-tuning introduces two types of challenges. First, aggregation error can arise from separately aggregating the two low-rank matrices. Second, even if the server aggregates the product of two low-rank matrices, it needs to decompose the aggregated matrix back into low-rank matrices. Since the decomposition is not unique, it can lead to decomposition drift. To tackle the aforementioned challenges, we propose federated low-rank Gram-matrix aggregation (FLoRG), a federated fine-tuning framework which employs a…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper presents a well-structured theoretical analysis with formal proofs, offering strong theoretical soundness and clear convergence guarantees. 2. It explores an interesting and under-studied problem—eliminating aggregation bias and decomposition drift in federated LoRA fine-tuning—introducing new insights into parameter-efficient federated learning. 3. The paper is clearly written, correctly annotated, and provides a thorough description of the proposed FLoRG framework, making it easy

Weaknesses

1. The paper does not address the partial client participation scenario, which is common in practical federated learning settings. Evaluating FLoRG under varying client availability would strengthen its applicability. 2. The experiments are conducted only on OPT-125M and RoBERTa-large, which are relatively dated compared to current state-of-the-art LLMs such as LLaMA-3 and Qwen-2.5. Using more recent backbones would better demonstrate the scalability and relevance of FLoRG. 3. The paper reports

Reviewer 02Rating 4Confidence 5

Strengths

* The paper is well-written. The authors did a good job categorising and explaining the existing problems. * The algorithm performs better than the mentioned baselines. * The authors provide a convergence analysis for FLoRG.

Weaknesses

* Clarity on Communication Saving. I would appreciate it if the authors explained the communication saving part of their claim. Did they measure the communication compared to full matrix communication or other Federated LoRA methods? * Server-Side Computational Overhead: The paper does not discuss the server-side computational cost, which appears to be substantial, especially doing matrix decomposition and solving optimization. * The baselines are considerably basic. By just checking recent A

Reviewer 03Rating 6Confidence 3

Strengths

1. Bias-free aggregation with one matrix. 2. Convergence bound tightens when alignment is used; ablations show sizeable accuracy gains from Procrustes; headline comms savings to target accuracy.

Weaknesses

1. The approach relies on semi-orthogonal L,R that never update; performance is sensitive to their initialization. 2. Each round per layer requires eigendecomposition of Q and an SVD for Procrustes; scalability or latency with many layers or clients isn’t benchmarked.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Mobile Crowdsensing and Crowdsourcing