DICE: Data Influence Cascade in Decentralized Learning
Tongtian Zhu, Wenhao Li, Can Wang, Fengxiang He

TL;DR
DICE introduces a novel method to estimate data influence cascades in decentralized learning networks, enabling fair attribution of contributions and fostering incentive mechanisms in distributed environments.
Contribution
It is the first method to quantify influence cascades in decentralized networks, considering data, topology, and loss landscape curvature.
Findings
Influence cascade depends on data, topology, and loss landscape curvature.
DICE provides tractable approximations for influence over neighbor hops.
Application potential includes collaborator selection and malicious behavior detection.
Abstract
Decentralized learning offers a promising approach to crowdsource data consumptions and computational workloads across geographically distributed compute interconnected through peer-to-peer networks, accommodating the exponentially increasing demands. However, proper incentives are still in absence, considerably discouraging participation. Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence ``cascade'' in a decentralized network. To overcome this, we design the first method to estimate \textbf{D}ata \textbf{I}nfluence \textbf{C}ascad\textbf{E} (DICE) in a decentralized environment. Theoretically, the framework derives tractable approximations of influence cascade over arbitrary neighbor hops, suggesting the influence cascade is…
Peer Reviews
Decision·ICLR 2025 Poster
1. The DICE framework is the first to systematically measure the cascading propagation of data influence in decentralized learning environments, providing an effective method to assess data contributions among nodes and filling a gap in data influence evaluation within decentralized networks. 2. The experiments cover different network topologies (such as ring and exponential graphs) and datasets (such as MNIST, CIFAR-10, and CIFAR-100), validating the applicability and consistency of the DICE fr
1. Figure 1 lacks legend information, making it difficult to understand. 2. The performance differences of the DICE framework under different parameters (such as learning rate, batch size, etc.) have not been thoroughly discussed. It is recommended to add parameter sensitivity experiments to demonstrate the impact of different parameter selections on the performance of the DICE framework, thereby enhancing its practicality.
- The paper is well-organized, with clear definitions, figures, and explanations that make the methods and results easy to follow. - The paper provides a solid theoretical framework, supported by rigorous proofs and analyses.
- Need for more details about the practical use of this technique: While the authors use LLMs as one of the examples in the introduction, it might not be the best example to use in this case. It hard to see how this research addresses a practical problem or application that has real-world significance, or how this framework would be relevant for practitioners. - Link with other papers that use gradient to cluster clients should be added, particularly interesting and relevant in the collaborator
1. This paper summarizes previous work on measuring data influence and highlights the gaps in applying these methods to distributed scenarios. 2. This paper proposes a sound “gold standard” and its first-order approximation to quantify individual contributions in decentralized learning.
1. The experiments are weak, and Section 5.3 is unfinished. 2. The notation η^t in Theorem 1 is previously appears as η_t in Algorithm 1.
Videos
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Peer-to-Peer Network Technologies · Privacy-Preserving Technologies in Data
