Think, But Don't Overthink: Reproducing Recursive Language Models
Daren Wang

TL;DR
This study investigates the effects of increasing recursion depth in Recursive Language Models, revealing that deeper recursion can impair performance and significantly increase computational costs, contrary to initial expectations.
Contribution
It provides empirical analysis of recursion depth effects in RLMs using open-source models, highlighting the risks of overthinking in recursive prompting.
Findings
Deeper recursion (depth=2) degrades performance on reasoning tasks.
Deeper recursion significantly increases execution time and token costs.
Depth-1 RLMs improve accuracy on complex reasoning tasks.
Abstract
This project reproduces and extends the recently proposed ``Recursive Language Models'' (RLMs) framework by Zhang et al. (2026). This framework enables Large Language Models (LLMs) to process near-infinite contexts by offloading the prompt into an external REPL environment. While the original paper relies on a default recursion depth of 1 and suggests deeper recursion as a future direction, this study specifically investigates the impact of scaling the recursion depth. Using state-of-the-art open-source agentic models (DeepSeek v3.2 and Kimi K2), I evaluated pure LLM, RLM (depth=1), and RLM (depth=2) on the S-NIAH and OOLONG benchmarks. The findings reveal a compelling phenomenon: Deeper recursion causes models to ``overthink''. While depth-1 RLMs effectively boost accuracy on complex reasoning tasks, applying deeper recursion (depth=2) or using RLMs on simple retrieval tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
