MSF-Model: Queuing-Based Analysis and Prediction of Metastable Failures in Replicated Storage Systems
Farzad Habibi, Tania Lorido-Botran, Ahmad Showail, Daniel C. Sturman, Faisal Nawab

TL;DR
This paper introduces MSF-Model, a queuing-based analytical framework for predicting metastable failures in replicated storage systems, validated through real experiments showing high prediction accuracy.
Contribution
The paper presents a novel queuing-based model for metastable failures in distributed storage, addressing a previously hard-to-model failure pattern.
Findings
MSF-Model accurately predicts metastable failures.
Real experiments validate the model's effectiveness.
The approach enhances understanding of failure dynamics in distributed systems.
Abstract
Metastable failure is a recent abstraction of a pattern of failures that occurs frequently in real-world distributed storage systems. In this paper, we propose a formal analysis and modeling of metastable failures in replicated storage systems. We focus on a foundational problem in distributed systems -- the problem of consensus -- to have an impact on a large class of systems. Our main contribution is the development of a queuing-based analytical model, MSF-Model, that can be used to characterize and predict metastable failures. MSF-Model integrates novel modeling concepts that allow modeling metastable failures which was interactable to model prior to our work. We also perform real experiments to reproduce and validate our model. Our real experiments show that MSF-Model predicts metastable failures with high accuracy by comparing the real experiment with the predictions from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Distributed and Parallel Computing Systems · Peer-to-Peer Network Technologies
