IterResearch: Rethinking Long-Horizon Agents with Interaction Scaling
Guoxin Chen, Zile Qiao, Xuanzhong Chen, Donglei Yu, Haotian Xu, Wayne Xin Zhao, Ruihua Song, Wenbiao Yin, Huifeng Yin, Liwen Zhang, Kuan Li, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

TL;DR
IterResearch introduces an iterative, interaction-scaled paradigm for long-horizon research agents, overcoming context limitations and enhancing reasoning through strategic workspace reconstruction and adaptive training, leading to significant performance improvements.
Contribution
The paper presents a novel iterative paradigm with interaction scaling and a new training strategy, improving long-horizon reasoning and performance over existing methods.
Findings
Achieves +14.5pp improvement across six benchmarks.
Extends interaction scaling to 2048 interactions with performance gains.
Enhances frontier models by up to 19.2pp using IterResearch as a prompting strategy.
Abstract
Recent advances in deep-research agents have shown promise for autonomous knowledge construction through dynamic reasoning over external sources. However, existing approaches rely on a mono-contextual paradigm that accumulates all information in a single, expanding context window, leading to context suffocation and noise contamination that limit their effectiveness on long-horizon tasks. We introduce \textbf{IterResearch}, a novel iterative deep-research paradigm that revisits long-horizon research through the lens of Interaction Scaling. Instead of relying on linear context accumulation, we adopt an MDP-inspired architecture with strategic workspace reconstruction. By maintaining an evolving report as memory and periodically synthesizing insights, our approach preserves consistent reasoning capacity across arbitrary exploration depths. To effectively train this paradigm, we employ…
Peer Reviews
Decision·ICLR 2026 Poster
- The problem is well-motivated: mono-contextual approaches accumulate information in a single context window, causing noise and limiting effectiveness on long-horizon tasks. - The presented approach significantly outperforms current state-of-the-art models in solution quality on selected benchmarks.
W1: The novelty of the approach is somewhat limited, as many of its core ideas, such as using discounted rewards and iterative refinement of trajectories, are already widely used in classical reinforcement learning. W2: The MDP formulation appears incomplete. It omits a reward function (e.g., a verification signal), and the transition function is likely not deterministic, since tool outputs can vary over time (e.g., search results). The decision space is unconventional: including Think and Repo
1. Clear, principled formulation. The Markovian workspace reconstruction is simple and well-motivated, giving constant context size and avoiding "context suffocation"/"noise contamination". The formal transition s_{t+1}=T(s_t,d_t,E(a_t)) is explicit. 2. Strong empirical coverage. Six benchmarks with diverse characteristics; competitive vs. proprietary systems on several, e.g., surpassing OpenAI DeepResearch on HLE and BrowseComp-zh. 3. General usefulness as a prompting recipe. IterResearch
1. Markovian constant context is a conceptual simplification rather than a true breakthrough. The claimed $O(1)$ workspace complexity is only formal when the evolving report $|M|$ is treated as fixed. In practice, $|M|$ grows with the number of synthesized summaries, and each tool response $|TR|$ can vary widely across steps. Therefore, the overall process still depends on in memory and computational complexity, merely folded into a different structure. This makes the "constant" context claim mo
1. The paper is clear and well-written 2. The overall idea is intuitive 3. Results demonstrate significant improvement (although there are some open questions here)
Overall, the results clearly seem to advance the SOTA so Im hoping for more clarifications to my identified weaknesses. Im happy to engage in discussion and revise my score if my concerns are adequately addressed. 1. I think one of the major claims of the paper is the Markovian State Reconstruction. There is not really any guarantee that this report is Markovian in any sense. The Markovian State suggests that the future state is independent of the history given the current state. If that were
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
