The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Zhouqi Hua; Wenwei Zhang; Chengqi Lyu; Yuzhe Gu; Songyang Gao; Kuikun Liu; Dahua Lin; Kai Chen

arXiv:2507.13332·cs.CL·January 29, 2026

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Zhouqi Hua, Wenwei Zhang, Chengqi Lyu, Yuzhe Gu, Songyang Gao, Kuikun Liu, Dahua Lin, Kai Chen

PDF

Open Access 3 Reviews

TL;DR

This paper introduces TAIL, a method that enhances the length generalization of large language models by imitating Turing Machine processes through synthetic chain-of-thought data, leading to improved reasoning over longer sequences.

Contribution

The paper proposes TAIL, a novel approach that synthesizes Turing Machine-like reasoning data to significantly improve LLMs' length generalization capabilities on diverse algorithmic tasks.

Findings

01

TAIL outperforms previous methods on synthetic datasets

02

Qwen2.5-7B shows improved performance with TAIL

03

Turing Machine concepts are crucial for length generalization

Abstract

Length generalization, the ability to solve problems of longer sequences than those observed during training, poses a core challenge of Transformer-based large language models (LLM). Although existing studies have predominantly focused on data-driven approaches for arithmetic operations and symbolic manipulation tasks, these approaches tend to be task-specific with limited overall performance. To pursue a more general solution, this paper focuses on a broader case of reasoning problems that are computable, i.e., problems that algorithms can solve, thus can be solved by the Turing Machine. From this perspective, this paper proposes Turing MAchine Imitation Learning (TAIL) to improve the length generalization ability of LLMs. TAIL synthesizes chain-of-thoughts (CoT) data that imitate the execution process of a Turing Machine by computer programs, which linearly expands the reasoning steps…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 3

Strengths

1. The idea of imitating Turing Machine execution is interesting and conceptually appealing. 2. TAIL is orthogonal to other approaches such as Index Hint, making it easily combinable with them.

Weaknesses

1. The paper lacks sufficient elaboration and analysis on why TAIL improves length generalization. 2. The experimental design does not convincingly demonstrate TAIL’s effectiveness. Specifically, the comparison between the original Qwen2.5-7B and the model fine-tuned on task-specific data is not fair. Moreover, the paper does not include comparisons with other prompt-engineering methods, such as Program-of-Thought, to justify the claimed advantages of TAIL.

Reviewer 02Rating 4Confidence 4

Strengths

1. This paper is well-motivated, studying length generalization – an important problem of LLM 2. The proposed method only require synthetic data, which makes it easier to extend to more tasks. 3. The paper performed comprehensive experiments and achieved improvements across diverse benchmarks, demonstrating strong generality and robustness.

Weaknesses

1. Limited practical significance: The benchmarks are largely rely on deterministic symbolic computation. The so-called Chain-of-Thought effectively becomes a **Chain of Computations**, simply executing predefined procedural steps rather than engaging in genuine high-level reasoning. As a result, the improvements may reflect better simulation of algorithmic traces, rather than any substantive enhancement in “reasoning ability.” 2. Limited Transferability to real-world reasoning scenarios. The wo

Reviewer 03Rating 2Confidence 3

Strengths

1. The Turing machine alignment provides a principled approach to structured reasoning, systematically addressing length generalization through linearized execution and explicit memory management. 2. Evaluation across 8 algorithm classes and 18 tasks demonstrates impressive universality beyond simple tasks commonly studied.

Weaknesses

1. Incomplete baselines. The paper doesn't compare against standard CoT fine-tuning, making it unclear whether gains come from Turing imitation or simply SFT. Comparisons are limited to un-trained base models. 2. The approach requires manually writing Python code for each task to generate TAIL-formatted CoT data, creating significant engineering overhead. This limits real-world adoption where users need automated solutions for new tasks. 3. The paper overlooks important connections to relevant

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms