Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

TL;DR
This paper provides a theoretical analysis demonstrating that trained one-layer multi-head transformers can provably learn symbolic multi-step reasoning tasks, with guarantees of generalization and insights into the emergence of reasoning abilities.
Contribution
It offers the first provable guarantees for how shallow transformers learn multi-step symbolic reasoning via gradient descent, explaining the emergence of reasoning capabilities.
Findings
Trained one-layer transformers can solve chain-of-thought reasoning tasks with generalization guarantees.
Attention heads learn to specialize and coordinate to perform complex reasoning steps.
Shallow multi-head transformers can implement multi-step reasoning typically associated with deeper models.
Abstract
Transformers have demonstrated remarkable capabilities in multi-step reasoning tasks. However, understandings of the underlying mechanisms by which they acquire these abilities through training remain limited, particularly from a theoretical standpoint. This work investigates how transformers learn to solve symbolic multi-step reasoning problems through chain-of-thought processes, focusing on path-finding in trees. We analyze two intertwined tasks: a backward reasoning task, where the model outputs a path from a goal node to the root, and a more complex forward reasoning task, where the model implements two-stage reasoning by first identifying the goal-to-root path and then reversing it to produce the root-to-goal path. Our theoretical analysis, grounded in the dynamics of gradient descent, shows that trained one-layer transformers can provably solve both tasks with generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Child and Animal Learning Development
