To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
Kevin Xu, Issei Sato

TL;DR
This paper formally compares Chain-of-Thought and Looped Transformers, revealing their distinct strengths in reasoning tasks and guiding their practical application choices.
Contribution
It provides a formal analysis distinguishing the capabilities of CoT and Looped Transformers, clarifying when each approach is more effective.
Findings
Looped Transformers efficiently simulate parallel computations for deterministic tasks.
CoT excels at approximate inference for compositional, self-reducible problems.
The analysis guides choosing the appropriate reasoning paradigm based on task structure.
Abstract
Chain-of-Thought (CoT) and Looped Transformers have been shown to empirically improve performance on reasoning tasks and to theoretically enhance expressivity by recursively increasing the number of computational steps. However, their comparative capabilities are still not well understood. In this paper, we provide a formal analysis of their respective strengths and limitations. We show that Looped Transformers can efficiently simulate parallel computations for deterministic tasks, which we formalize as evaluation over directed acyclic graphs. In contrast, CoT with stochastic decoding excels at approximate inference for compositional structures, namely self-reducible problems. These separations suggest the tasks for which depth-driven recursion is more suitable, thereby offering practical cues for choosing between reasoning paradigms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Research and Philosophical Inquiry · Quantum Mechanics and Applications · Cognitive Computing and Networks
