Empirical Characterization of Temporal Constraint Processing in LLMs
Javier Mar\'in

TL;DR
This paper empirically evaluates how well large language models process real-time temporal constraints, revealing significant limitations and risks in their current capabilities for time-critical decision-making.
Contribution
It provides a systematic characterization of temporal constraint processing in LLMs, highlighting their limitations and the need for architectural mechanisms beyond next-token prediction.
Findings
Models show bimodal performance with 95% or 50% accuracy.
Prompt formatting causes 30-60 percentage point accuracy swings.
Fine-tuning improves partial capabilities by 12-37 percentage points.
Abstract
When deploying LLMs in agentic architectures requiring real-time decisions under temporal constraints, we assume they reliably determine whether action windows remain open or have closed. This assumption is untested. We characterize temporal constraint processing across eight production-scale models (2.8-8B parameters) using deadline detection tasks, revealing systematic deployment risks: bimodal performance distribution (models achieve either 95% or 50% accuracy), extreme prompt brittleness (30-60 percentage point swings from formatting changes alone), and systematic action bias (100% false positive rates in failing models). Parameter count shows no correlation with capability in this range-a 3.8B model matches 7B models while other 7B models fail completely. Fine-tuning on 200 synthetic examples improves models with partial capability by 12-37 percentage points. We demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · AI-based Problem Solving and Planning · Multimodal Machine Learning Applications
