The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
Zhouqi Hua, Wenwei Zhang, Chengqi Lyu, Yuzhe Gu, Songyang Gao, Kuikun Liu, Dahua Lin, Kai Chen

TL;DR
This paper introduces TAIL, a method that enhances the length generalization of large language models by imitating Turing Machine processes through synthetic chain-of-thought data, leading to improved reasoning over longer sequences.
Contribution
The paper proposes TAIL, a novel approach that synthesizes Turing Machine-like reasoning data to significantly improve LLMs' length generalization capabilities on diverse algorithmic tasks.
Findings
TAIL outperforms previous methods on synthetic datasets
Qwen2.5-7B shows improved performance with TAIL
Turing Machine concepts are crucial for length generalization
Abstract
Length generalization, the ability to solve problems of longer sequences than those observed during training, poses a core challenge of Transformer-based large language models (LLM). Although existing studies have predominantly focused on data-driven approaches for arithmetic operations and symbolic manipulation tasks, these approaches tend to be task-specific with limited overall performance. To pursue a more general solution, this paper focuses on a broader case of reasoning problems that are computable, i.e., problems that algorithms can solve, thus can be solved by the Turing Machine. From this perspective, this paper proposes Turing MAchine Imitation Learning (TAIL) to improve the length generalization ability of LLMs. TAIL synthesizes chain-of-thoughts (CoT) data that imitate the execution process of a Turing Machine by computer programs, which linearly expands the reasoning steps…
Peer Reviews
Decision·ICLR 2026 Poster
1. The idea of imitating Turing Machine execution is interesting and conceptually appealing. 2. TAIL is orthogonal to other approaches such as Index Hint, making it easily combinable with them.
1. The paper lacks sufficient elaboration and analysis on why TAIL improves length generalization. 2. The experimental design does not convincingly demonstrate TAIL’s effectiveness. Specifically, the comparison between the original Qwen2.5-7B and the model fine-tuned on task-specific data is not fair. Moreover, the paper does not include comparisons with other prompt-engineering methods, such as Program-of-Thought, to justify the claimed advantages of TAIL.
1. This paper is well-motivated, studying length generalization – an important problem of LLM 2. The proposed method only require synthetic data, which makes it easier to extend to more tasks. 3. The paper performed comprehensive experiments and achieved improvements across diverse benchmarks, demonstrating strong generality and robustness.
1. Limited practical significance: The benchmarks are largely rely on deterministic symbolic computation. The so-called Chain-of-Thought effectively becomes a **Chain of Computations**, simply executing predefined procedural steps rather than engaging in genuine high-level reasoning. As a result, the improvements may reflect better simulation of algorithmic traces, rather than any substantive enhancement in “reasoning ability.” 2. Limited Transferability to real-world reasoning scenarios. The wo
1. The Turing machine alignment provides a principled approach to structured reasoning, systematically addressing length generalization through linearized execution and explicit memory management. 2. Evaluation across 8 algorithm classes and 18 tasks demonstrates impressive universality beyond simple tasks commonly studied.
1. Incomplete baselines. The paper doesn't compare against standard CoT fine-tuning, making it unclear whether gains come from Turing imitation or simply SFT. Comparisons are limited to un-trained base models. 2. The approach requires manually writing Python code for each task to generate TAIL-formatted CoT data, creating significant engineering overhead. This limits real-world adoption where users need automated solutions for new tasks. 3. The paper overlooks important connections to relevant
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
