Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception

Yize Cheng; Arshia Soltani Moakhar; Chenrui Fan; Parsa Hosseini; Kazem Faghih; Zahra Sodagar; Wenxiao Wang; Soheil Feizi

arXiv:2510.23853·cs.CL·April 17, 2026

Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception

Yize Cheng, Arshia Soltani Moakhar, Chenrui Fan, Parsa Hosseini, Kazem Faghih, Zahra Sodagar, Wenxiao Wang, Soheil Feizi

PDF

1 Repo 1 Datasets

TL;DR

This paper identifies and addresses the issue of temporal blindness in LLM agents, which affects their tool use decisions in dynamic environments, and introduces a dataset and methods to improve their temporal alignment.

Contribution

The authors introduce the TicToc dataset and analyze how existing LLMs fail to align tool use with human time perception, proposing post-training alignment as a solution.

Findings

01

Existing models have less than 65% alignment with human preferences under time information.

02

Naive prompt-based methods are limited in improving temporal alignment.

03

Post-training alignment techniques can enhance LLMs' temporal awareness.

Abstract

Large language model (LLM) agents are increasingly used to interact with and execute tasks in dynamic environments. However, a critical yet overlooked limitation of these agents is that they, by default, assume a stationary context, failing to account for the real-world time elapsed between messages. We refer to this as "temporal blindness". This limitation hinders decisions about when to invoke tools, leading agents to either over-rely on stale context and skip needed tool calls, or under-rely on it and redundantly repeat tool calls. To study this challenge, we constructed TicToc, a diverse dataset of multi-turn user-agent message trajectories across 76 scenarios, spanning dynamic environments with high, medium, and low time sensitivity. We collected human preferences between "calling a tool" and "directly answering" on each sample, and evaluated how well LLM tool-calling decisions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

othmanadi/chronos
github

Datasets

yizecheng/TicToc
dataset· 55 dl
55 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.