Topology-Aware Knowledge Propagation in Decentralized Learning
Mansi Sakarvadia, Nathaniel Hudson, Tian Li, Ian Foster, Kyle Chard

TL;DR
This paper investigates how knowledge, especially out-of-distribution data, propagates in decentralized learning systems and proposes topology-aware strategies to improve this propagation, significantly enhancing model accuracy.
Contribution
The paper identifies challenges in propagating out-of-distribution knowledge in decentralized learning and introduces topology-aware aggregation methods to address these issues.
Findings
Topology-aware strategies improve OOD knowledge propagation by 123% on average.
Propagation of OOD knowledge is heavily influenced by topology and data location.
Popular algorithms struggle with effective OOD knowledge dissemination across devices.
Abstract
Decentralized learning enables collaborative training of models across naturally distributed data without centralized coordination or maintenance of a global model. Instead, devices are organized in arbitrary communication topologies, in which they can only communicate with neighboring devices. Each device maintains its own local model by training on its local data and integrating new knowledge via model aggregation with neighbors. Therefore, knowledge is propagated across the topology via successive aggregation rounds. We study, in particular, the propagation of out-of-distribution (OOD) knowledge. We find that popular decentralized learning algorithms struggle to propagate OOD knowledge effectively to all devices. Further, we find that both the location of OOD data within a topology, and the topology itself, significantly impact OOD knowledge propagation. We then propose…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Clear identification of OOD propagation as a distinct and critical problem in decentralized learning. 2. Elegant, low-overhead methods that seamlessly integrate with existing gossip algorithms. 3. Comprehensive evaluation: five datasets, three topology families, varying OOD locations, and spectral-gap theory.
1. The paper assumes a single-device OOD “worst case” but does not analyze scenarios with multiple OOD sources, which may arise in practice and interact nonlinearly. 2. Only degree and betweenness are studied, yet other centrality metrics (e.g. eigenvector, closeness) might offer better trade-offs; appendix C claims negligible cost but no profiling of alternative metrics is shown. 3. Betweenness computation, even if amortized,scales superlinearly (O(nm + n² log n)), and while Table 3 reports <1
1. The proposed topology-aware aggregation strategies (Degree and Betweenness) are intuitive, simple to implement, taking in account the effect of topology on the learning effect. 2. The authors conduct experiments across five different datasets and multiple varied topologies and systematically study the impact of topology degree, modularity, and node count, providing a comprehensive characterization of the solution's performance.
1. The main concern is about the assumption of the data distribution and OOD data definition.I suspect the practicality of only one node having the OOD data. Besides, the experimental setup defines OOD data by a "backdoor" methodology (i.e., inserting triggers), as mentioned in Appendix B.2.2. This setup is very narrow and seems artificial. Also, this definition is different from the commonly used term OOD. The findings in this work may be hard to generalize to more natural setups. 2. The topol
Strong Motivation:The problem of OOD knowledge propagation is highly practical and critical in real-world decentralized learning scenarios, such as IoT and edge computing. Comprehensive Experiments:The experimental setup is very extensive, covering a wide range of variables and providing strong empirical evidence for the conclusions. Theoretical Support:The spectral gap analysis in the appendix offers a plausible theoretical explanation for why the topology-aware strategies are superior, movin
There is a lack of clear definitions for OOD and IID knowledge in the decentralized context, which obscures the relationship and distinction between them. The paper provides insufficient justification for selecting degree and betweenness centrality over other potential topological metrics, leaving it unclear whether these are the optimal choices. The analysis is confined to static topologies; applicability and potential overhead in dynamic networks are not discussed.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning
