Why Attention Fails: A Taxonomy of Faults in Attention-Based Neural Networks

Sigma Jahan; Saurabh Singh Rajput; Tushar Sharma; Mohammad Masudur Rahman

arXiv:2508.04925·cs.SE·November 4, 2025

Why Attention Fails: A Taxonomy of Faults in Attention-Based Neural Networks

Sigma Jahan, Saurabh Singh Rajput, Tushar Sharma, Mohammad Masudur Rahman

PDF

TL;DR

This paper provides the first comprehensive empirical analysis of faults in attention-based neural networks, introducing a new taxonomy of attention-specific faults and diagnostic heuristics to improve understanding and troubleshooting.

Contribution

It presents a novel taxonomy of seven attention-specific fault categories and evidence-based heuristics, addressing a critical gap in diagnosing attention mechanism failures.

Findings

01

Over half of ABNN faults are due to attention-specific mechanisms

02

Developed four diagnostic heuristics explaining 33% of faults

03

Analyzed 555 faults from 96 projects across multiple frameworks

Abstract

Attention mechanisms are at the core of modern neural architectures, powering systems ranging from ChatGPT to autonomous vehicles and driving a major economic impact. However, high-profile failures, such as ChatGPT's nonsensical outputs or Google's suspension of Gemini's image generation due to attention weight errors, highlight a critical gap: existing deep learning fault taxonomies might not adequately capture the unique failures introduced by attention mechanisms. This gap leaves practitioners without actionable diagnostic guidance. To address this gap, we present the first comprehensive empirical study of faults in attention-based neural networks (ABNNs). Our work is based on a systematic analysis of 555 real-world faults collected from 96 projects across ten frameworks, including GitHub, Hugging Face, and Stack Overflow. Through our analysis, we develop a novel taxonomy comprising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.