Scheduling and Checkpointing optimization algorithm for Byzantine fault tolerance in Cloud Clusters
Sathya Chinnathambi, Agilan Santhanam

TL;DR
This paper presents a scheduling and checkpointing optimization algorithm designed to detect, tolerate, and eliminate Byzantine faults in cloud clusters, improving fault tolerance and resource allocation for mission-critical applications.
Contribution
It introduces the WSSS scheduling algorithm and TCC checkpoint optimization algorithm, which together enhance fault detection and minimize overhead in Byzantine fault tolerance.
Findings
TCC reduces fault tolerance overhead exponentially.
WSSS effectively allocates virtual resources based on server performance.
Simulation results validate improved fault tolerance and resource management.
Abstract
Among those faults Byzantine faults offers serious challenge to fault tolerance mechanism, because it often go undetected at the initial stage and it can easily propagate to other VMs before a detection is made. Consequently some of the mission critical application such as air traffic control, online baking etc still staying away from the cloud for such reasons. However if a Byzantine faults is not detected and tolerated at initial stage then applications such as big data analytics can go completely wrong in spite of hours of computations performed by the entire cloud. Therefore in the previous work a fool-proof Byzantine fault detection has been proposed, as a continuation this work designs a scheduling algorithm (WSSS) and checkpoint optimization algorithm (TCC) to tolerate and eliminate the Byzantine faults before it makes any impact. The WSSS algorithm keeps track of server…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
