RRFT: A Rank-Based Resource Aware Fault Tolerant Strategy for Cloud Platforms
Chinmaya Kumar Dehury, Prasan Kumar Sahoo, Bharadwaj Veeravalli

TL;DR
This paper introduces RRFT, a resource-aware fault tolerance strategy for cloud platforms that ranks components based on significance and uses a Markov Decision Process to optimize replica allocation, reducing resource overhead.
Contribution
It presents a novel ranking-based fault tolerance approach combined with MDP to dynamically determine replicas, improving resource efficiency in cloud applications.
Findings
Reduces virtual machine usage by ~10%
Decreases physical machine requirements by ~4.2%
Maintains fault tolerance with fewer resources
Abstract
The applications that are deployed in the cloud to provide services to the users encompass a large number of interconnected dependent cloud components. Multiple identical components are scheduled to run concurrently in order to handle unexpected failures and provide uninterrupted service to the end user, which introduces resource overhead problem for the cloud service provider. Furthermore such resource-intensive fault tolerant strategies bring extra monetary overhead to the cloud service provider and eventually to the cloud users. In order to address these issues, a novel fault tolerant strategy based on the significance level of each component is developed. The communication topology among the application components, their historical performance, failure rate, failure impact on other components, dependencies among them, etc., are used to rank those application components to further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
