Transformers Provably Learn Algorithmic Solutions for Graph Connectivity, But Only with the Right Data
Qilin Ye, Deqing Fu, Robin Jia, Vatsal Sharan

TL;DR
This paper demonstrates that Transformers can learn algorithmic solutions for graph connectivity when trained on data within their capacity, combining theoretical proofs and empirical evidence.
Contribution
It provides a theoretical analysis of how Transformers learn graph algorithms and shows that data within the model's capacity is crucial for learning exact solutions.
Findings
Transformers can compute graph connectivity using matrix powers.
Training data within model capacity leads to exact algorithm learning.
Beyond-capacity data results in heuristic-based solutions.
Abstract
Transformers often fail to learn generalizable algorithms, instead relying on brittle heuristics. Using graph connectivity as a testbed, we explain this phenomenon both theoretically and empirically. We consider a simplified Transformer architecture, the Disentangled Transformer, and prove that an -layer model can compute connectivity in graphs with diameters up to , implementing an algorithm equivalent to computing powers of the adjacency matrix. By analyzing training dynamics, we prove that whether the model learns this strategy hinges on whether most training instances are within this model capacity. Within-capacity graphs (diameter ) drive the learning of the algorithmic solution while beyond-capacity graphs drive the learning of a simple heuristic based on node degrees. Finally, we empirically show that restricting training data to stay within a model's capacity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Stochastic Gradient Optimization Techniques
