Evaluating List Construction and Temporal Understanding capabilities of Large Language Models

Alexandru Dumitru; V Venktesh; Adam Jatowt; Avishek Anand

arXiv:2506.21783·cs.CL·June 30, 2025

Evaluating List Construction and Temporal Understanding capabilities of Large Language Models

Alexandru Dumitru, V Venktesh, Adam Jatowt, Avishek Anand

PDF

Open Access

TL;DR

This paper introduces the TLQA benchmark to evaluate large language models' abilities in list construction and temporal understanding, revealing significant shortcomings and guiding future research directions.

Contribution

The paper presents the first benchmark specifically designed to assess LLMs' capabilities in structured list answering with temporal context, addressing a gap in existing evaluations.

Findings

01

Current models struggle with complete, temporally aligned answers in closed-book settings.

02

Models require improved retrieval strategies for open-domain temporal question answering.

03

Significant gaps in temporal reasoning capabilities of state-of-the-art LLMs are identified.

Abstract

Large Language Models (LLMs) have demonstrated immense advances in a wide range of natural language tasks. However, these models are susceptible to hallucinations and errors on particularly temporal understanding tasks involving multiple entities in answers. In such tasks, they fail to associate entities with accurate time intervals, generate a complete list of entities in answers or reason about events associated with specific temporal bounds. Existing works do not extensively evaluate the abilities of the model to perform implicit and explicit temporal understanding in a list answer construction setup. To bridge this gap, we propose the Time referenced List based Question Answering or TLQA benchmark that requires structured answers in list format aligned with corresponding time periods. Our TLQA benchmark, requires both list construction and temporal understanding simultaneously,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare