Predictive Bayesian Arbitration: A Scalable Noisy-OR Model with Service Criticality Awareness
Anil Jangam, Ganesh Karthick Rajendran, Roy Kantharajah

TL;DR
This paper introduces a scalable, predictive arbitration framework for Geo-HA clusters using a Bayesian Noisy-OR model that learns failure dependencies and enables proactive switchovers, reducing downtime and detection time.
Contribution
It presents a novel Bayesian Noisy-OR based online learning system with expert priors for predictive failure detection in distributed cloud environments.
Findings
Achieves 60% reduction in failure detection time
Improves switchover efficiency by up to 77.8%
Enables proactive switchovers before failures occur
Abstract
Geographically High-Available (Geo-HA) cluster systems are essential for service continuity in distributed cloud-native environments. However, traditional arbitration mechanisms, which are often predicated on deterministic node-level heartbeats, are resource-intensive and inherently reactive. This necessitates a dedicated arbiter per deployment and leads to reactive switchovers that incur unavoidable downtime, occurring only after a failure has already compromised the system. This paper presents a novel predictive arbitration framework that utilizes a shared, microservice-based architecture to consolidate arbitration logic across multiple Geo-HA domains, significantly reducing the aggregate infrastructure footprint. Central to our approach is an adaptive online learning mechanism grounded in a Bayesian Noisy-OR model that autonomously discovers and learns temporal cascade dependencies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
