FLARE: One-Shot PE-Level Fault Localization in Systolic Arrays via Algebraic Test Vectors
Logashree Venkatasubramanian (1), Zishen Wan (1), Viveck Cadambe (1) ((1) Georgia Institute of Technology)

TL;DR
This paper introduces a novel coprime test vector method for one-shot fault localization at the PE level in systolic arrays, significantly improving fault detection accuracy with minimal overhead.
Contribution
It proposes a purely algorithmic approach using coprime test vectors for precise PE-level fault localization without hardware redundancy.
Findings
Single test pass localizes faulty row with high probability for INT16 arrays up to 256x256.
A second pass with ratio computation achieves exact localization when needed.
For single-bit errors, odd coprime entries enable one-round exact localization.
Abstract
Systolic arrays are the dominant compute fabric for neural network inference. Prior work has addressed column-level fault detection efficiently with uniform test patterns, but row-level (PE-level) fault localization within a faulty column remains open without resorting to hardware redundancy. The fundamental obstacle is that uniform test inputs destroy per-row signatures: any test that activates every row equally cannot distinguish which row is the source of an observed deviation. In this paper, we propose a lightweight, purely algorithmic remedy based on coprime test vectors. By assigning pairwise coprime integers as test-input entries, a permanent weight-register fault produces a deviation whose divisibility signature uniquely identifies the faulty row. Under a general bounded error model, a single test pass localizes the faulty row with high probability. This error model covers a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
