# Dynamic Fault Tolerance Through Resource Pooling

**Authors:** Christian M. Fuchs, Nadia M. Murillo, Aske Plaat, Erik van der Kouwe,, Todor Stefanov

arXiv: 1902.09493 · 2019-02-26

## TL;DR

This paper presents a hybrid fault-tolerance method for miniaturized satellites using resource pooling and software-based detection, enabling adaptable, energy-efficient, and long-duration mission support with strong fault coverage.

## Contribution

It introduces a novel software-centric fault-tolerance approach utilizing resource pooling in MPSoC, suitable for small satellites and adaptable for larger spacecraft.

## Key findings

- Achieves strong fault coverage with FPGA-based implementation.
- Enables adaptive performance profiles for energy and fault management.
- Supports long-duration missions with fault tolerance and resource pooling.

## Abstract

Miniaturized satellites are currently not considered suitable for critical, high-priority, and complex multi-phased missions, due to their low reliability. As hardware-side fault tolerance (FT) solutions designed for larger spacecraft can not be adopted aboard very small satellites due to budget, energy, and size constraints, we developed a hybrid FT-approach based upon only COTS components, commodity processor cores, library IP, and standard software. This approach facilitates fault detection, isolation, and recovery in software, and utilizes fault-coverage techniques across the embedded stack within an multiprocessor system-on-chip (MPSoC). This allows our FPGA-based proof-of-concept implementation to deliver strong fault-coverage even for missions with a long duration, but also to adapt to varying performance requirements during the mission. The operator of a spacecraft utilizing this approach can define performance profiles, which allow an on-board computer (OBC) to trade between processing capacity, fault coverage, and energy consumption using simple heuristics. The software-side FT approach developed also offers advantages if deployed aboard larger spacecraft through spare resource pooling, enabling an OBC to more efficiently handle permanent faults. This FT approach in part mimics a critical biological systems's way of tolerating and adjusting to failures, enabling graceful ageing of an MPSoC.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.09493/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1902.09493/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1902.09493/full.md

---
Source: https://tomesphere.com/paper/1902.09493