Understanding the Failure Modes of Transformers through the Lens of Graph Neural Networks
Hunjae Lee

TL;DR
This paper analyzes the failure modes of decoder-only transformers using graph neural network theory, revealing information propagation bottlenecks and geometric properties that lead to predictable failures, and offers a unified theoretical perspective on existing solutions.
Contribution
It introduces a GNN-based framework to understand transformer failures, connecting their issues to known GNN bottlenecks and providing a theoretical basis for improving solutions.
Findings
Transformers share failure modes with GNNs related to information propagation.
Causal structure in transformers creates geometric properties affecting information flow.
Existing solutions can be better understood and improved through a unified theoretical perspective.
Abstract
Transformers and more specifically decoder-only transformers dominate modern LLM architectures. While they have shown to work exceptionally well, they are not without issues, resulting in surprising failure modes and predictably asymmetric performance degradation. This article is a study of many of these observed failure modes of transformers through the lens of graph neural network (GNN) theory. We first make the case that much of deep learning, including transformers, is about learnable information mixing and propagation. This makes the study of model failure modes a study of bottlenecks in information propagation. This naturally leads to GNN theory, where there is already a rich literature on information propagation bottlenecks and theoretical failure modes of models. We then make the case that many issues faced by GNNs are also experienced by transformers. In addition, we analyze…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Big Data and Digital Economy
