TL;DR
This paper introduces ThinkARM, a framework that abstracts reasoning traces in language models into functional steps, revealing structural differences and dynamics in mathematical problem solving.
Contribution
It applies Schoenfeld's Episode Theory to create an explicit, scalable reasoning analysis method for language models, uncovering new insights into their reasoning processes.
Findings
Reveals reproducible thinking dynamics in models.
Shows structural differences between reasoning and non-reasoning models.
Identifies exploration as a critical step linked to correctness.
Abstract
Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models), a scalable framework that explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, Verify, etc. When applied to mathematical problem solving by diverse models, this abstraction reveals reproducible thinking dynamics and structural differences between reasoning and non-reasoning models, which are not apparent from token-level views. We further present two diagnostic case studies showing that exploration functions as a critical branching step associated with correctness, and that efficiency-oriented methods selectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
