Loading paper
The Illusion of Procedural Reasoning: Measuring Long-Horizon FSM Execution in LLMs | Tomesphere