Algorithms for Optimal Replica Placement Under Correlated Failure in Hierarchical Failure Domains
K. Alex Mills, R. Chandrasekaran, Neeraj Mittal

TL;DR
This paper introduces new algorithms for optimally placing data replicas in hierarchical failure domains of data centers to minimize correlated failures, addressing a gap in existing optimization methods.
Contribution
It formulates a novel optimization problem for replica placement and provides efficient algorithms for single and multiple data block scenarios in hierarchical models.
Findings
Dynamic programming algorithm for single-file replica placement with $O(n + ho \, log \, ho)$ complexity.
Exact polynomial-time algorithm for multiple blocks with constant skew.
Improved understanding of optimal replica distribution to enhance data availability.
Abstract
In data centers, data replication is the primary method used to ensure availability of customer data. To avoid correlated failure, cloud storage infrastructure providers model hierarchical failure domains using a tree, and avoid placing a large number of data replicas within the same failure domain (i.e. on the same branch of the tree). Typical best practices ensure that replicas are distributed across failure domains, but relatively little is known concerning optimization algorithms for distributing data replicas. Using a hierarchical model, we answer how to distribute replicas across failure domains optimally. We formulate a novel optimization problem for replica placement in data centers. As part of our problem, we formalize and explain a new criterion for optimizing a replica placement. Our overall goal is to choose placements in which correlated failures disable as few replicas as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
