Signalling Health for Improved Kubernetes Microservice Availability
Jacob Roberts, Blair Archibald, Phil Trinder

TL;DR
This paper introduces a Signal-based Container Monitoring (SCM) approach for Kubernetes that detects container health changes faster and more reliably than traditional Poll-based Container Monitoring (PCM), improving service availability.
Contribution
The paper presents the design, implementation, and empirical evaluation of SCM, demonstrating its advantages over PCM in Kubernetes environments.
Findings
SCM detects container failures 86% faster than PCM.
SCM reduces erroneous failure detections compared to PCM.
SCM achieves faster detection with limited resource overheads.
Abstract
Microservices are often deployed and managed by a container orchestrator that can detect and fix failures to maintain the service availability critical in many applications. In Poll-based Container Monitoring (PCM), the orchestrator periodically checks container health. While a common approach, PCM requires careful tuning, may degrade service availability, and can be slow to detect container health changes. An alternative is Signal-based Container Monitoring (SCM), where the container signals the orchestrator when its status changes. We present the design, implementation, and evaluation of an SCM approach for Kubernetes and empirically show that it has benefits over PCM, as predicted by a new mathematical model. We compare the service availability of SCM and PCM over six experiments using the SockShop benchmark. SCM does not require that polling intervals are tuned, and yet detects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Software-Defined Networks and 5G
