LTLBench: Towards Benchmarks for Evaluating Temporal Reasoning in Large Language Models

Weizhi Tang; Kwabena Nuamah; Vaishak Belle

arXiv:2407.05434·cs.CL·January 6, 2026

LTLBench: Towards Benchmarks for Evaluating Temporal Reasoning in Large Language Models

Weizhi Tang, Kwabena Nuamah, Vaishak Belle

PDF

Open Access 1 Repo

TL;DR

This paper introduces LTLBench, a new benchmark dataset of 2000 challenges based on Linear Temporal Logic, to evaluate and analyze the temporal reasoning abilities of large language models.

Contribution

It presents a novel approach using LTL to synthesize challenges, creates a comprehensive dataset, and benchmarks multiple LLMs to understand their temporal reasoning capabilities.

Findings

01

LLMs show varied performance on LTL-based challenges.

02

Increasing complexity affects LLM reasoning and performance unexpectedly.

03

Qualitative analysis reveals key issues in LLM temporal reasoning processes.

Abstract

Temporal Reasoning (TR) is a critical ability for LLMs to understand and reason over temporal information and relationships between events. To study the TR ability in LLMs, prior works provide different ways for evaluating various aspects of TR ability. In this work, we propose an alternative perspective for evaluating TR ability by leveraging Linear Temporal Logic (LTL), and develop a pipeline to automatically synthesize challenges for assessing the TR ability of LLMs. Based on this pipeline, we construct a dataset, namely LTLBench, consisting of $2000$ TR challenges, and benchmark 12 LLMs across 5 different methods. Furthermore, we conduct additional experiments to investigate the impact of increasing the number of formula operators and events on both LLM performance and the complexity of TR problems. We also perform qualitative analyses of their reasoning processes and the effects of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rutatang/ltlbench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Logic, Reasoning, and Knowledge