Riemannian Federated Learning via Averaging Gradient Streams
Zhenwei Huang, Wen Huang, Pratik Jawanpuria, Bamdev Mishra

TL;DR
This paper introduces RFedAGS, a Riemannian federated learning algorithm that effectively handles partial participation and data heterogeneity, with proven convergence properties and strong empirical performance.
Contribution
The paper proposes RFedAGS, a novel Riemannian federated learning algorithm utilizing gradient stream averaging, addressing partial participation and heterogeneity challenges.
Findings
Proven global convergence and sublinear rates under decaying step sizes.
Convergence to a neighborhood of stationary points with fixed step sizes.
Demonstrated strong empirical performance on synthetic and real data.
Abstract
Federated learning (FL) as a distributed learning paradigm has a significant advantage in addressing large-scale machine learning tasks. In the Euclidean setting, FL algorithms have been extensively studied with both theoretical and empirical success. However, there exist few works that investigate federated learning algorithms in the Riemannian setting. In particular, critical challenges such as partial participation and data heterogeneity among agents are not explored in the Riemannian federated setting. This paper presents and analyzes a Riemannian FL algorithm, called RFedAGS, based on a new efficient server aggregation -- averaging gradient streams, which can simultaneously handle partial participation and data heterogeneity. We theoretically show that the proposed RFedAGS has global convergence and sublinear convergence rate under decaying step sizes cases; and converges…
Peer Reviews
Decision·ICLR 2026 Poster
* The proposed aggregation mechanism is novel and easy to follow. * The paper provides comprehensive convergence analysis. * The main experiments effectively demonstrate the proposed method’s effectiveness.
* While I understand the reasonableness of $G$, I am wondering what the value of $G$ would be when the true probabilities are not available to the server in the experiments. * How are the data partitioned across clients? How many total clients are included in the experiments, and what is the client participation ratio? * The ablation study is somewhat limited, and the sensitivity of several important parameters is missing—for example, different participation ratios, varying numbers of local step
This paper proposes and analyzes RFedAGS, a Riemannian federated learning algorithm that introduces a new and efficient server aggregation scheme based on averaging gradient streams. The method is designed to effectively handle both partial client participation and data heterogeneity. Theoretical analysis establishes that RFedAGS achieves global convergence and a sublinear convergence rate under decaying step sizes, and further converges sublinearly or linearly to a neighborhood of a stationary
1. Limited novelty. The key idea—aggregating gradient flows in tangent space—is conceptually straightforward once the FedAvg update is projected to a manifold setting. 2. The paper lacks a argument for why RFedAGS offers a distinct or superior geometric interpretation. 3. Limited Scope of Baselines: Although several strong Riemannian FL baselines are included (RFedAvg, RFedSVRG, RFedProj) for targeted tasks, more recent algorithms are not considered, e.g., Wang et al., 2025 [1]. 4. Some results
1. The algorithm avoids the computational burden of exponential map inverses and parallel transport used in earlier Riemannian FL methods. 2. It is the first Riemannian FL method proven to work under arbitrary partial participation.
1. Theoretical clarity and novelty: While the proposed framework claims to generalize existing Riemannian FL methods by relaxing the requirements on retraction and vector transport, the theoretical advancement remains unclear. Specifically, the main difficulty in proving convergence under assumptions like 3.1, 3.2, and 3.5 is not explicitly articulated. The authors should clarify why convergence analysis becomes more challenging under generalized retraction and bounded vector transport, and in w
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Advanced Statistical Modeling Techniques
MethodsSoftmax · Attention Is All You Need
