From Characterization to Microarchitecture: Designing an Elegant and Reliable BFP-Based NPU
Jie Zhang, Jiapeng Guan, Hao Zhou, Xiaomeng Han, Tinglue Wang, Ran Wei, Zhe Jiang

TL;DR
This paper conducts an empirical reliability study of BFP-based NPUs, revealing vulnerabilities and proposing a fault-tolerant microarchitecture with minimal performance and hardware overhead.
Contribution
It introduces the first detailed fault analysis of BFP NPUs and proposes a novel microarchitecture that enhances reliability with low overhead.
Findings
Heterogeneous vulnerabilities in BFP NPUs identified through fault injection.
Conventional end-to-end checks are ineffective under nonlinear block scaling.
Proposed microarchitecture achieves near-dual redundancy reliability with minimal overhead.
Abstract
Block Floating-Point (BFP) is emerging as an attractive data format for edge Neural Processing Units (NPUs), combining wide dynamic range with high hardware efficiency. However, its behavior under hardware faults and suitability for safety-critical deployments remain underexplored. Here, we present the first in-depth empirical reliability study of BFP-based NPUs. Using RTL-level fault injection on NPUs, our bit- and path-level analysis reveals pronounced heterogeneous vulnerabilities and shows conventional end-to-end check becomes ineffective under nonlinear block scaling. Guided by these insights, we design a fault-tolerant BFP-based NPU microarchitecture that aligns the BFP computational semantics with reliability constraints. The design uses a row/column-wise blocking strategy to decouple the fixed-point mantissa computations from the scalar exponent path, and introduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
