A generic approach for reactive stateful mitigation of application failures in distributed robotics systems deployed with Kubernetes
Florian Mirus, Frederik Pasch, Nikhil Singhal, Kay-Ulrich, Scholl

TL;DR
This paper presents a novel, generic approach for monitoring and reactive failure mitigation in distributed robotic systems deployed with Kubernetes, leveraging Behaviour Trees for application-agnostic resilience enhancement.
Contribution
It introduces a stateful, reactive failure mitigation method using Behaviour Trees, tailored for Kubernetes and ROS2 robotic deployments, addressing a gap in cloud-native robotic resilience.
Findings
Effective failure mitigation demonstrated on AMR navigation
Supports complex monitoring strategies
Applicable to various robotic workloads
Abstract
Offloading computationally expensive algorithms to the edge or even cloud offers an attractive option to tackle limitations regarding on-board computational and energy resources of robotic systems. In cloud-native applications deployed with the container management system Kubernetes (K8s), one key problem is ensuring resilience against various types of failures. However, complex robotic systems interacting with the physical world pose a very specific set of challenges and requirements that are not yet covered by failure mitigation approaches from the cloud-native domain. In this paper, we therefore propose a novel approach for robotic system monitoring and stateful, reactive failure mitigation for distributed robotic systems deployed using Kubernetes (K8s) and the Robot Operating System (ROS2). By employing the generic substrate of Behaviour Trees, our approach can be applied to any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Service-Oriented Architecture and Web Services
