FARe: Fault-Aware GNN Training on ReRAM-based PIM Accelerators

Pratyush Dhingra; Chukwufumnanya Ogbogu; Biresh Kumar Joardar,; Janardhan Rao Doppa; Ananth Kalyanaraman; Partha Pratim Pande

arXiv:2401.10522·cs.AR·January 22, 2024·2 cites

FARe: Fault-Aware GNN Training on ReRAM-based PIM Accelerators

Pratyush Dhingra, Chukwufumnanya Ogbogu, Biresh Kumar Joardar,, Janardhan Rao Doppa, Ananth Kalyanaraman, Partha Pratim Pande

PDF

Open Access

TL;DR

FARe is a fault-aware training framework for GNNs on ReRAM-based PIM accelerators that significantly improves accuracy despite hardware faults, with minimal timing overhead.

Contribution

The paper introduces FARe, a novel fault-aware training framework that effectively mitigates hardware faults in ReRAM-based GNN accelerators, enhancing accuracy and efficiency.

Findings

01

Restores GNN accuracy by 47.6% on faulty hardware.

02

Achieves approximately 1% timing overhead.

03

Outperforms existing fault-tolerance approaches.

Abstract

Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architecture is an attractive solution for training Graph Neural Networks (GNNs) on edge platforms. However, the immature fabrication process and limited write endurance of ReRAMs make them prone to hardware faults, thereby limiting their widespread adoption for GNN training. Further, the existing fault-tolerant solutions prove inadequate for effectively training GNNs in the presence of faults. In this paper, we propose a fault-aware framework referred to as FARe that mitigates the effect of faults during GNN training. FARe outperforms existing approaches in terms of both accuracy and timing overhead. Experimental results demonstrate that FARe framework can restore GNN test accuracy by 47.6% on faulty ReRAM hardware with a ~1% timing overhead compared to the fault-free counterpart.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Advanced Graph Neural Networks