Controlling Data Access Load in Distributed Systems
Mehmet Aktas, Emina Soljanin

TL;DR
This paper analyzes how storage redundancy levels and data object assignment strategies affect load balancing in distributed systems, providing theoretical bounds and insights for different storage schemes.
Contribution
It introduces a formal analysis of load balancing in distributed storage, deriving necessary redundancy levels and comparing different data assignment schemes.
Findings
Redundancy factor d must be at least logarithmic in number of nodes for load balance.
Clustering and cyclic designs require higher redundancy (Ω(log n)) for effective load balancing.
Random and block designs can achieve load balance with lower or sufficient redundancy, depending on the scheme.
Abstract
Distributed systems store data objects redundantly to balance the data access load over multiple nodes. Load balancing performance depends mainly on 1) the level of storage redundancy and 2) the assignment of data objects to storage nodes. We analyze the performance implications of these design choices by considering four practical storage schemes that we refer to as clustering, cyclic, block and random design. We formulate the problem of load balancing as maintaining the load on any node below a given threshold. Regarding the level of redundancy, we find that the desired load balance can be achieved in a system of nodes only if the replication factor , which is a necessary condition for any storage design. For clustering and cyclic designs, is necessary and sufficient. For block and random designs, is sufficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Distributed and Parallel Computing Systems · Advanced Database Systems and Queries
