Industrial Computing Systems: A Case Study of Fault Tolerance Analysis

Andrey A. Shchurov

arXiv:1503.08715·cs.SY·March 31, 2015

Industrial Computing Systems: A Case Study of Fault Tolerance Analysis

Andrey A. Shchurov

PDF

TL;DR

This paper analyzes the failure rates of industrial computing systems over their lifespan, focusing on fault tolerance, maintenance scheduling, and extending operational life under financial constraints.

Contribution

It introduces a method to analyze failure rates and optimize maintenance scheduling to improve fault tolerance and system longevity.

Findings

01

Failure rate increases critically at end-of-life

02

Maintenance scheduling can mitigate failure risks

03

Extended fault-tolerant operation is achievable

Abstract

Fault tolerance is a key factor of industrial computing systems design. But in practical terms, these systems, like every commercial product, are under great financial constraints and they have to remain in operational state as long as possible due to their commercial attractiveness. This work provides an analysis of the instantaneous failure rate of these systems at the end of their life-time period. On the basis of this analysis, we determine the effect of a critical increase in the system failure rate and the basic condition of its existence. The next step determines the maintenance scheduling which can help to avoid this effect and to extend the system life-time in fault-tolerant mode.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.