Proactive Service Migration for Long-Running Byzantine Fault Tolerant Systems
Wenbing Zhao

TL;DR
This paper introduces a migration-based proactive recovery scheme for long-running Byzantine fault tolerant systems that reduces vulnerability windows and improves system availability by eliminating reboot delays and coordinating recovery among replicas.
Contribution
It presents a novel proactive recovery method based on service migration that automatically adapts to system load and enhances fault tolerance without reboot delays.
Findings
Reduces vulnerability window in Byzantine fault tolerant systems.
Improves system availability during faults.
Automatically adjusts to system load to prevent excessive recoveries.
Abstract
In this paper, we describe a novel proactive recovery scheme based on service migration for long-running Byzantine fault tolerant systems. Proactive recovery is an essential method for ensuring long term reliability of fault tolerant systems that are under continuous threats from malicious adversaries. The primary benefit of our proactive recovery scheme is a reduced vulnerability window. This is achieved by removing the time-consuming reboot step from the critical path of proactive recovery. Our migration-based proactive recovery is coordinated among the replicas, therefore, it can automatically adjust to different system loads and avoid the problem of excessive concurrent proactive recoveries that may occur in previous work with fixed watchdog timeouts. Moreover, the fast proactive recovery also significantly improves the system availability in the presence of faults.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Real-Time Systems Scheduling · Software System Performance and Reliability
