The Scaling Properties of Implicit Deductive Reasoning in Transformers

Enrico Vompa; Tanel Tammet

arXiv:2605.04330·cs.AI·May 7, 2026

The Scaling Properties of Implicit Deductive Reasoning in Transformers

Enrico Vompa, Tanel Tammet

PDF

TL;DR

This paper explores how deep Transformers with specific masking can implicitly perform deductive reasoning, approaching explicit Chain of Thought performance across various graph structures, though explicit reasoning remains essential for extrapolation.

Contribution

It demonstrates that sufficiently deep bidirectional Transformers can implicitly perform reasoning, reducing reliance on explicit Chain of Thought methods for certain tasks.

Findings

01

Implicit reasoning approaches explicit CoT performance in deep models.

02

Enforcing algorithmic alignment improves reasoning capabilities.

03

Explicit CoT remains necessary for depth extrapolation.

Abstract

We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.