A Large Scale Analysis of Unreliable Stochastic Networks
Reza Aghajani, Philippe Robert, Wen Sun

TL;DR
This paper models and analyzes the reliability of large distributed file storage networks using a new mathematical framework, revealing their asymptotic behavior and decay rates as the number of servers grows large.
Contribution
It introduces a novel stochastic model for large-scale unreliable networks, demonstrating convergence to a nonlinear Markov process and deriving decay rate bounds.
Findings
The network's evolution converges to a nonlinear Markov process as the number of servers increases.
A mean-field convergence result is established for the model.
A lower bound on the exponential decay rate of file availability is derived.
Abstract
The problem of reliability of a large distributed system is analyzed via a new mathematical model. A typical framework is a system where a set of files are duplicated on several data servers. When one of these servers breaks down, all copies of files stored on it are lost. In this way, repeated failures may lead to losses of files. The efficiency of such a network is directly related to the performances of the mechanism used to duplicate files on servers. In this paper we study the evolution of the network using a natural duplication policy giving priority to the files with the least number of copies. We investigate the asymptotic behavior of the network when the number of servers is large. The analysis is complicated by the large dimension of the state space of the empirical distribution of the state of the network. A stochastic model of the evolution of the network which has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeer-to-Peer Network Technologies · Advanced Queuing Theory Analysis · Distributed systems and fault tolerance
