Real-Time Reasoning Agents in Evolving Environments

Yule Wen; Yixin Ye; Yanzhe Zhang; Diyi Yang; Hao Zhu

arXiv:2511.04898·cs.AI·November 10, 2025

Real-Time Reasoning Agents in Evolving Environments

Yule Wen, Yixin Ye, Yanzhe Zhang, Diyi Yang, Hao Zhu

PDF

Open Access 1 Datasets 3 Reviews

TL;DR

This paper introduces the concept of real-time reasoning for agents in dynamic environments, proposing new paradigms and a framework to improve timely decision-making under evolving conditions.

Contribution

It defines real-time reasoning as a new problem, develops the Real-Time Reasoning Gym, and proposes AgileThinker, a hybrid approach combining reactive and planning paradigms.

Findings

01

AgileThinker outperforms single-paradigm agents under time pressure.

02

State-of-the-art models struggle with logical and timely judgments.

03

The framework establishes real-time reasoning as a key testbed for practical AI agents.

Abstract

Agents in the real world must make not only logical but also timely judgments. This requires continuous awareness of the dynamic environment: hazards emerge, opportunities arise, and other agents act, while the agent's reasoning is still unfolding. Despite advances in language model reasoning, existing approaches fail to account for this dynamic nature. We introduce real-time reasoning as a new problem formulation for agents in evolving environments and build Real-Time Reasoning Gym to demonstrate it. We study two paradigms for deploying language models in agents: (1) reactive agents, which employ language models with bounded reasoning computation for rapid responses, and (2) planning agents, which allow extended reasoning computation for complex problems. Our experiments show that even state-of-the-art models struggle with making logical and timely judgments in either paradigm. To…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. It innovatively defines the "real-time reasoning" problem for LLM agents (addressing the flaw that environments evolve parallel to agent reasoning) and builds Real-Time Reasoning Gym with three games (Freeway, Snake, Overcooked) to systematically control cognitive load and time pressure (using tokens as a hardware-agnostic time proxy). 2. The proposed AgileThinker has an innovative dual-thread design: its reactive thread references partial planning traces for real-time decisions, solving the

Weaknesses

1. AgileThinker lacks an adaptive mechanism for thread resource allocation: the optimal token budget for the reactive thread ( $N_{TR}$ ) varies across environments (e.g., ~5k tokens for Freeway vs. ~2k tokens for Snake/Overcooked) and requires manual empirical tuning, with no solution proposed to dynamically adjust it based on real-time environmental changes. 2. Experimental scenarios are disconnected from real-world complexity: all experiments are conducted on three simulated games (Freeway, S

Reviewer 02Rating 6Confidence 4

Strengths

1. Real-time reasoning of AI agents under time pressure is an under-evaluated direction in the current wave of AI agents research, and this paper presents a good setting and three diverse synthetic environments to prototype research in this direction. Experiments show that with increased time pressure and task complexity, existing agent performance decrease significantly from the case where the environment waits indefinitely for agent reasoning. 2. The paper presents AgileThinker, an agent desi

Weaknesses

1. The assumption that token limit is a good proxy for actual wallclock time is valid only when all agents are implemented with the same LLM served with the same hardware. Even if hardware independence is desirable, it is unlikely that different models will share the same slope (TPOT) and intercept (whatever factor that goes into time-to-first-token), because of model size, model architecture, etc. The paper's formulation and experiments neglected this important consideration. 2. As also seen

Reviewer 03Rating 8Confidence 4

Strengths

1. The work provides a very novel and interesting problem setting for language agents, lifting the unrealistic assumption that the environment waits for agents to execute their actions. 2. The motivation is clear and overall the paper is well-written 3. The quantitative results are very strong, and the authors additionally provide a qualitative case study further illustrating the benefits of AgileThinker.

Weaknesses

I could not discern any significant weaknesses, but I do have some questions in a few places (see questions section).

Code & Models

Datasets

SALT-NLP/RealtimeGym
dataset· 158 dl
158 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Constraint Satisfaction and Optimization