End-User Effects of Microreboots in Three-Tiered Internet Systems
George Candea, Armando Fox

TL;DR
Microreboots restart specific software components quickly and with minimal disruption, providing an effective and low-cost recovery method for Internet services that improves availability and user experience.
Contribution
This paper demonstrates that microreboots are nearly as effective as full reboots but significantly less disruptive, and introduces their application to online auction systems.
Findings
Microreboots reduced failed user requests by 65%.
Perceived downtime decreased by 78%.
Microreboots enable aggressive, low-cost recovery strategies.
Abstract
Microreboots restart fine-grained components of software systems "with a clean slate," and only take a fraction of the time needed for full system reboot. Microreboots provide an application-generic recovery technique for Internet services, which can be supported entirely in middleware and requires no changes to the applications or any a priori knowledge of application semantics. This paper investigates the effect of microreboots on end-users of an eBay-like online auction application; we find that microreboots are nearly as effective as full reboots, but are significantly less disruptive in terms of downtime and lost work. In our experiments, microreboots reduced the number of failed user requests by 65% and the perceived downtime by 78% compared to a server process restart. We also show how to replace user-visible transient failures with transparent call-retry, at the cost of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Caching and Content Delivery · Software System Performance and Reliability
