Learning Recovery Strategies for Dynamic Self-healing in Reactive Systems
Mateo Sanabria, Ivana Dusparic, Nicolas Cardozo

TL;DR
This paper introduces a novel self-healing framework for reactive systems that uses expressive runtime monitors and reinforcement learning to dynamically detect failures and execute recovery strategies, improving resilience.
Contribution
It proposes a new approach combining expressive runtime monitors with reinforcement learning to learn and execute recovery strategies in complex reactive systems.
Findings
Achieved 55%-92% failure recovery rate in the mouse movement tracking application.
Performed on par with predefined strategies in the DeltaIoT self-healing system.
Demonstrated feasibility of dynamic recovery strategy extraction and execution.
Abstract
Self-healing systems depend on following a set of predefined instructions to recover from a known failure state. Failure states are generally detected based on domain specific specialized metrics. Failure fixes are applied at predefined application hooks that are not sufficiently expressive to manage different failure types. Self-healing is usually applied in the context of distributed systems, where the detection of failures is constrained to communication problems, and resolution strategies often consist of replacing complete components. Our proposal targets complex reactive systems, defining monitors as predicates specifying satisfiability conditions of system properties. Such monitors are functionally expressive and can be defined at run time to detect failure states at any execution point. Once failure states are detected, we use a Reinforcement Learning-based technique to learn a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Advanced Software Engineering Methodologies · Advanced Malware Detection Techniques
