In-Situ Hardware Error Detection Using Specification-Derived Petri Net Models and Behavior-Derived State Sequences
Tomonari Tanaka, Takumi Uezono, Kohei Suenaga, Masanori Hashimoto

TL;DR
This paper presents two novel methods for in-situ hardware error detection in control flows of accelerators, using Petri nets and state sequences, achieving high detection rates with minimal area overhead.
Contribution
Introduction of specification-derived Petri nets and behavior-derived state sequences for control flow error detection in hardware accelerators, validated across multiple designs.
Findings
High error detection rates (48%-100%) in control logic.
Minimal area overhead (around 10%) for effective detection.
Effective detection of control register upsets and control input perturbations.
Abstract
In hardware accelerators used in data centers and safety-critical applications, soft errors and resultant silent data corruption significantly compromise reliability, particularly when upsets occur in control-flow operations, leading to severe failures. To address this, we introduce two methods for monitoring control flows: using specification-derived Petri nets and using behavior-derived state transitions. We validated our method across four designs: convolutional layer operation, Gaussian blur, AES encryption, and a router in Network-on-Chip. Our fault injection campaign targeting the control registers and primary control inputs demonstrated high error detection rates in both datapath and control logic. Synthesis results show that a maximum detection rate is achieved with a few to around 10% area overhead in most cases. The proposed detectors quickly detect 48% to 100% of failures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics · Distributed systems and fault tolerance · Petri Nets in System Modeling
