Resilience through Automated Adaptive Configuration for Distribution and Replication
Scott D. Stoller, Balaji Jayasankar, Yanhong A. Liu

TL;DR
This paper introduces an automated framework that enhances system resilience by optimizing the distribution and replication of software components across heterogeneous hardware, using a novel state-space exploration algorithm.
Contribution
The paper presents a new algorithm for resilient configuration and reconfiguration of software systems, incorporating a quotient reduction technique for efficient state-space exploration.
Findings
Successfully applied to an autonomous driving system model
Demonstrated improved resilience through adaptive reconfiguration
Efficient state-space exploration with quotient reduction
Abstract
This paper presents a powerful automated framework for making complex systems resilient under failures, by optimized adaptive distribution and replication of interdependent software components across heterogeneous hardware components with widely varying capabilities. A configuration specifies how software is distributed and replicated: which software components to run on each computer, which software components to replicate, which replication protocols to use, etc. We present an algorithm that, given a system model and resilience requirements, (1) determines initial configurations of the system that are resilient, and (2) generates a reconfiguration policy that determines reconfiguration actions to execute in response to failures and recoveries. This model-finding algorithm is based on state-space exploration and incorporates powerful optimizations, including a quotient reduction based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Advanced Software Engineering Methodologies · Software System Performance and Reliability
