Fault Tolerance in Distributed Neural Computing

Anton Kulakov; Mark Zwolinski; Jeff Reeve

arXiv:1509.09199·cs.NE·October 7, 2015

Fault Tolerance in Distributed Neural Computing

Anton Kulakov, Mark Zwolinski, Jeff Reeve

PDF

TL;DR

This paper investigates the fault-tolerance of distributed neural networks with decentralized control, analyzing their robustness to hardware faults and communication failures during learning and operation.

Contribution

It demonstrates that distributed neural networks with local learning rules can maintain functionality despite hardware and communication faults, offering insights into scalable fault-tolerant systems.

Findings

01

Neural networks exhibit intrinsic fault-tolerance during learning and operation.

02

Fault injection increases overhead but does not compromise overall system performance.

03

Distributed, local-rule-based networks are resilient to hardware and communication failures.

Abstract

With the increasing complexity of computing systems, complete hardware reliability can no longer be guaranteed. We need, however, to ensure overall system reliability. One of the most important features of artificial neural networks is their intrinsic fault-tolerance. The aim of this work is to investigate whether such networks have features that can be applied to wider computational systems. This paper presents an analysis, in both the learning and operational phases, of a distributed feed-forward neural network with decentralised event-driven time management, which is insensitive to intermittent faults caused by unreliable communication or faulty hardware components. The learning rules used in the model are local in space and time, which allows efficient scalable distributed implementation. We investigate the overhead caused by injected faults and analyse the sensitivity to limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.