The Design and Architecture of the Microsoft Cluster Service -- A Practical Approach to High-Availability and Scalability
Werner Vogels, Dan Dumitriu, Ken Birman, Rod Gamache, Mike Massa, Rob, Short, John Vert, Joe Barrera

TL;DR
This paper details the architecture and design decisions of Microsoft Cluster Service (MSCS), focusing on its high-availability, scalability features, and how it supports fault-tolerant applications on Windows NT.
Contribution
It provides a comprehensive description of MSCS architecture, design choices, and application integration, highlighting practical approaches to high-availability and scalability.
Findings
MSCS enables fault-tolerance for server applications.
Scalability is achieved through a node and application management system.
Features are added to simplify fault-tolerant application development.
Abstract
Microsoft Cluster Service (MSCS) extends the Win-dows NT operating system to support high-availability services. The goal is to offer an execution environment where off-the-shelf server applications can continue to operate, even in the presence of node failures. Later ver-sions of MSCS will provide scalability via a node and application management system that allows applications to scale to hundreds of nodes. This paper provides a de-tailed description of the MSCS architecture and the de-sign decisions that have driven the implementation of the service. The paper also describes how some major appli-cations use the MSCS features, and describes features added to make it easier to implement and manage fault-tolerant applications on MSCS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems
