Software that Learns from its Own Failures

Martin Monperrus

arXiv:1502.00821·cs.SE·February 4, 2015·2 cites

Software that Learns from its Own Failures

Martin Monperrus

PDF

Open Access

TL;DR

This paper proposes a novel paradigm where software systems actively learn from failures using monitoring and self-injection of failures to improve their recovery strategies and robustness over time.

Contribution

It introduces a new approach for software to learn from failures through monitoring and self-injection, enabling adaptive recovery strategies.

Findings

01

Systems can automatically explore alternative recovery strategies.

02

Self-injection of failures improves understanding of system robustness.

03

Enhanced recovery capabilities reduce unanticipated failures.

Abstract

All non-trivial software systems suffer from unanticipated production failures. However, those systems are passive with respect to failures and do not take advantage of them in order to improve their future behavior: they simply wait for them to happen and trigger hard-coded failure recovery strategies. Instead, I propose a new paradigm in which software systems learn from their own failures. By using an advanced monitoring system they have a constant awareness of their own state and health. They are designed in order to automatically explore alternative recovery strategies inferred from past successful and failed executions. Their recovery capabilities are assessed by self-injection of controlled failures; this process produces knowledge in prevision of future unanticipated failures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Software Reliability and Analysis Research · Advanced Software Engineering Methodologies