InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Yuchen Yan; Liang Jiang; Jin Jiang; Shuaicheng Li; Zujie Wen; Zhiqiang Zhang; Jun Zhou; Jian Shao; Yueting Zhuang; Yongliang Shen

arXiv:2602.06960·cs.CL·February 10, 2026

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Yuchen Yan, Liang Jiang, Jin Jiang, Shuaicheng Li, Zujie Wen, Zhiqiang Zhang, Jun Zhou, Jian Shao, Yueting Zhuang, Yongliang Shen

PDF

Open Access

TL;DR

InftyThink+ is a reinforcement learning framework that optimizes iterative reasoning processes, improving accuracy, efficiency, and out-of-distribution generalization in large reasoning models by learning when to summarize and how to resume reasoning.

Contribution

It introduces an end-to-end RL approach with a two-stage training scheme for optimizing iterative reasoning strategies in large models.

Findings

01

Improves accuracy by 21% on AIME24

02

Outperforms conventional chain-of-thought RL methods

03

Reduces inference latency and accelerates training

Abstract

Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but this paradigm suffers from quadratic cost, context length limits, and degraded reasoning due to lost-in-the-middle effects. Iterative reasoning mitigates these issues by periodically summarizing intermediate thoughts, yet existing methods rely on supervised learning or fixed heuristics and fail to optimize when to summarize, what to preserve, and how to resume reasoning. We propose InftyThink+, an end-to-end reinforcement learning framework that optimizes the entire iterative reasoning trajectory, building on model-controlled iteration boundaries and explicit summarization. InftyThink+ adopts a two-stage training scheme with supervised cold-start followed by trajectory-level reinforcement learning, enabling the model to learn strategic summarization and continuation decisions. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications