Ecoscape: Fault Tolerance Benchmark for Adaptive Remediation Strategies in Real-Time Edge ML
Hendrik Reiter, Ahmad Rzgar Hamid, Florian Schl\"osser, Mikkel Baun Kj{\ae}rgaard, Wilhelm Hasselbring

TL;DR
Ecoscape is a benchmark framework that evaluates fault remediation strategies in edge ML systems using chaos engineering, enabling fair comparison and optimization of fault tolerance without physical testbeds.
Contribution
It introduces Ecoscape, a novel benchmark for assessing and comparing remediation strategies in fault-prone edge ML environments, supporting domain-specific objectives.
Findings
Ecoscape effectively simulates realistic fault scenarios.
It provides a quantifiable score for remediation strategy performance.
Demonstrates utility in optimizing fault tolerance in edge ML inference.
Abstract
Edge computing offers significant advantages for realtime data processing tasks, such as object recognition, by reducing network latency and bandwidth usage. However, edge environments are susceptible to various types of fault. A remediator is an automated software component designed to adjust the configuration parameters of a software service dynamically. Its primary function is to maintain the services operational state within predefined Service Level Objectives by applying corrective actions in response to deviations from these objectives. Remediators can be implemented based on the Kubernetes container orchestration tool by implementing remediation strategies such as rescheduling or adjusting application parameters. However, currently, there is no method to compare these remediation strategies fairly. This paper introduces Ecoscape, a comprehensive benchmark designed to evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
