Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Hongyin Luo, Nathaniel Morgan, Tina Li, Derek Zhao, Ai Vy Ngo, Philip Schroeder, Lijie Yang, Assaf Ben-Kish, Jack O'Brien, James Glass

TL;DR
This paper introduces TIM and TIMRUN, a novel framework that enables large language models to perform long-horizon, structured reasoning beyond traditional context limits by modeling reasoning as trees and maintaining a selective working memory.
Contribution
The paper presents TIM and TIMRUN, a new approach for recursive problem solving and long-horizon reasoning in LLMs, overcoming context and memory constraints.
Findings
Supports virtually unlimited working memory and multi-hop tool calls
Maintains high inference throughput with up to 90% KV cache utilization
Achieves accurate reasoning on mathematical and retrieval tasks
Abstract
To break the context limits of large language models (LLMs) that bottleneck reasoning accuracy and efficiency, we propose the Thread Inference Model (TIM), a family of LLMs trained for recursive and decompositional problem solving, and TIMRUN, an inference runtime enabling long-horizon structured reasoning beyond context limits. Together, TIM hosted on TIMRUN supports virtually unlimited working memory and multi-hop tool calls within a single language model inference, overcoming output limits, positional-embedding constraints, and GPU-memory bottlenecks. Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive subtasks, and conclusions based on the concept we proposed in Schroeder et al, 2025. During generation, we maintain a working memory that…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Introduction of a recursive and decompositional problem-solving method, improving structured reasoning. 2. Effective context pruning mechanism that supports virtually unlimited working memory and multi-hop tool calls. 3. High inference throughput maintained even with significant manipulation of GPU memory resources. Demonstrated accuracy in reasoning on complex tasks, including mathematical problem-solving and information retrieval.
1. Lack of detailed comparison with existing models to highlight relative performance improvements. 2. Insufficient discussion on potential computational overhead introduced by the context pruning mechanism. 3. Limited exploration of the model's scalability across different domains and tasks.
The key innovation is not just the structured output but the direct manipulation of the KV cache based on this structure to enable dynamic context pruning. This moves beyond passive context compression or simple retrieval to an active, structured memory management system. The idea of enabling multi-hop tool use within a single inference call by handling tool I/O at the runtime level is a particularly elegant and significant departure from conventional, latency-prone agentic loops. This approa
1. Table 1 lacks necessary baselines, such as the accuracy of qwen3-8b. 2. The author mainly conducts evaluations in mathematics and web question-answering tasks. However, considering that the method in this paper is oriented towards tasks with extremely long contexts, the author should evaluate it on tasks that require long contexts, such as processing extremely long documents and SWE-bench.
* The work introduces a new framework, Thread-2, that models reasoning as recursive trees of subtasks and allows dynamic pruning or irrelevant information for a clean working memory. This is a novel and interesting idea. * The proposed TIM offers a training recipe for recursive reasoning and multi-hop tool use in a single LLM inference. This shows how model training can be adapted to the proposed inference framework. * This work addresses a core problem in LLM-based reasoning, which is the limit
* The model is based on Qwen3-8B, yet performance is much lower than the baseline reported in the technical report [1]. Qwen3-8B achieves 76.0 in AIME 2024 (Table 17, Qwen3 technical report) but proposed models only achieve 40.0 (Table 1, the current work). * TIM executes the custom tools at LLM inference backend and treats this as an advantage. However, this may not be the case for real-world applications, where the custom tools (e.g., computer use) may not be available at the inference backend
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Topic Modeling · Parallel Computing and Optimization Techniques
