Why Attention Fails: A Taxonomy of Faults in Attention-Based Neural Networks
Sigma Jahan, Saurabh Singh Rajput, Tushar Sharma, Mohammad Masudur Rahman

TL;DR
This paper provides the first comprehensive empirical analysis of faults in attention-based neural networks, introducing a new taxonomy of attention-specific faults and diagnostic heuristics to improve understanding and troubleshooting.
Contribution
It presents a novel taxonomy of seven attention-specific fault categories and evidence-based heuristics, addressing a critical gap in diagnosing attention mechanism failures.
Findings
Over half of ABNN faults are due to attention-specific mechanisms
Developed four diagnostic heuristics explaining 33% of faults
Analyzed 555 faults from 96 projects across multiple frameworks
Abstract
Attention mechanisms are at the core of modern neural architectures, powering systems ranging from ChatGPT to autonomous vehicles and driving a major economic impact. However, high-profile failures, such as ChatGPT's nonsensical outputs or Google's suspension of Gemini's image generation due to attention weight errors, highlight a critical gap: existing deep learning fault taxonomies might not adequately capture the unique failures introduced by attention mechanisms. This gap leaves practitioners without actionable diagnostic guidance. To address this gap, we present the first comprehensive empirical study of faults in attention-based neural networks (ABNNs). Our work is based on a systematic analysis of 555 real-world faults collected from 96 projects across ten frameworks, including GitHub, Hugging Face, and Stack Overflow. Through our analysis, we develop a novel taxonomy comprising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
