The Scaling Properties of Implicit Deductive Reasoning in Transformers
Enrico Vompa, Tanel Tammet

TL;DR
This paper explores how deep Transformers with specific masking can implicitly perform deductive reasoning, approaching explicit Chain of Thought performance across various graph structures, though explicit reasoning remains essential for extrapolation.
Contribution
It demonstrates that sufficiently deep bidirectional Transformers can implicitly perform reasoning, reducing reliance on explicit Chain of Thought methods for certain tasks.
Findings
Implicit reasoning approaches explicit CoT performance in deep models.
Enforcing algorithmic alignment improves reasoning capabilities.
Explicit CoT remains necessary for depth extrapolation.
Abstract
We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
