
TL;DR
This paper introduces 'artificial redundancy' as a formal generalization of traditional fault tolerance methods, enabling the use of diverse sources of redundancy to create cost-effective and highly resilient fault-tolerant systems.
Contribution
It proposes the concept of 'artificial redundancy' and 'artificial fault tolerance' to extend existing fault-tolerant approaches with new, diverse redundancy sources.
Findings
AFT extends current fault-tolerance methods.
Artificial redundancy reduces costs.
Enhanced diversity improves fault tolerance.
Abstract
Fault tolerance is essential for building reliable services; however, it comes at the price of redundancy, mainly the "replication factor" and "diversity". With the increasing reliance on Internet-based services, more machines (mainly servers) are needed to scale out, multiplied with the extra expense of replication. This paper revisits the very fundamentals of fault tolerance and presents "artificial redundancy": a formal generalization of "exact copy" redundancy in which new sources of redundancy are exploited to build fault tolerant systems. On this concept, we show how to build "artificial replication" and design "artificial fault tolerance" (AFT). We discuss the properties of these new techniques showing that AFT extends current fault tolerant approaches to use other forms of redundancy aiming at reduced cost and high diversity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Service-Oriented Architecture and Web Services
