Estimating Maximum Error Impact in Dynamic Data-driven Applications for Resource-aware Adaption of Software-based Fault-Tolerance
Bj\"orn B\"onninghoff, Horst Schirmeier

TL;DR
This paper introduces a method to estimate the potential impact of runtime errors in dynamic, data-driven applications, aiding resource-aware fault-tolerance strategies in embedded systems.
Contribution
It presents a novel approach for coarse-grained error impact estimation based on task data dependencies in dynamic workloads.
Findings
Effective error impact estimation for H.264 decoder
Supports resource-aware resilience in embedded systems
Enables selective fault-tolerance deployment
Abstract
The rise of transient faults in modern hardware requires system designers to consider errors occurring at runtime. Both hardware- and software-based error handling must be deployed to meet application reliability requirements. The level of required reliability can vary for system components and depend on input and state, so that a selective use of resilience methods is advised, especially for resource-constrained platforms as found in embedded systems. If an error occurring at runtime can be classified as having negligible or tolerable impact, less effort can be spent on correcting it. As the actual impact of an error is often dependent on the state of a system at time of occurrence, it can not be determined precisely for highly dynamic workloads in data-driven applications. We present a concept to estimate error propagation in sets of tasks with variable data dependencies. This allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics · Software Reliability and Analysis Research · Software System Performance and Reliability
