The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization

Meng Li; Tong Xie; Zuodong Zhang; Runsheng Wang

arXiv:2601.14148·cs.AR·January 21, 2026

The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization

Meng Li, Tong Xie, Zuodong Zhang, Runsheng Wang

PDF

Open Access

TL;DR

This paper introduces cross-layer analysis tools and design strategies to improve the reliability and efficiency of AI accelerators facing nanoscale aging and variation effects.

Contribution

It presents a systematic framework for reliability-aware AI accelerator design, integrating cross-layer modeling and workload-specific optimizations.

Findings

01

Developed aging and variation-aware timing analysis tools

02

Optimized dataflow for critical input pattern reduction

03

Designed resilient architectures for large language models

Abstract

As the CMOS technology pushes to the nanoscale, aging effects and process variations have become increasingly pronounced, posing significant reliability challenges for AI accelerators. Traditional guardband-based design approaches, which rely on pessimistic timing margin, sacrifice significant performance and computational efficiency, rendering them inadequate for high-performance AI computing demands. Current reliability-aware AI accelerator design faces two core challenges: (1) the lack of systematic cross-layer analysis tools to capture coupling reliability effects across device, circuit, architecture, and application layers; and (2) the fundamental trade-off between conventional reliability optimization and computational efficiency. To address these challenges, this paper systematically presents a series of reliability-aware accelerator designs, encompassing (1) aging and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemiconductor materials and devices · Radiation Effects in Electronics · Low-power high-performance VLSI design