FogROS2-FT: Fault Tolerant Cloud Robotics
Kaiyuan Chen, Kush Hari, Trinity Chung, Michael Wang, Nan Tian,, Christian Juette, Jeffrey Ichnowski, Liu Ren, John Kubiatowicz, Ion Stoica,, and Ken Goldberg

TL;DR
FogROS2-FT enhances cloud robotics by providing fault tolerance through multi-cloud replication, enabling reliable and cost-effective robotic services despite cloud failures and network issues.
Contribution
It introduces a multi-cloud extension for cloud robotics that automatically replicates services and routes requests, improving fault tolerance and reducing costs.
Findings
Achieves up to 2.2x cost reduction in motion planning.
Reduces 99th percentile latency by up to 5.53x.
Improves fault tolerance for object detection and segmentation.
Abstract
Cloud robotics enables robots to offload complex computational tasks to cloud servers for performance and ease of management. However, cloud compute can be costly, cloud services can suffer occasional downtime, and connectivity between the robot and cloud can be prone to variations in network Quality-of-Service (QoS). We present FogROS2-FT (Fault Tolerant) to mitigate these issues by introducing a multi-cloud extension that automatically replicates independent stateless robotic services, routes requests to these replicas, and directs the first response back. With replication, robots can still benefit from cloud computations even when a cloud service provider is down or there is low QoS. Additionally, many cloud computing providers offer low-cost spot computing instances that may shutdown unpredictably. Normally, these low-cost instances would be inappropriate for cloud robotics, but the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Cloud Computing and Resource Management · Distributed systems and fault tolerance
Methodstravel james
