Evaluating List Construction and Temporal Understanding capabilities of Large Language Models
Alexandru Dumitru, V Venktesh, Adam Jatowt, Avishek Anand

TL;DR
This paper introduces the TLQA benchmark to evaluate large language models' abilities in list construction and temporal understanding, revealing significant shortcomings and guiding future research directions.
Contribution
The paper presents the first benchmark specifically designed to assess LLMs' capabilities in structured list answering with temporal context, addressing a gap in existing evaluations.
Findings
Current models struggle with complete, temporally aligned answers in closed-book settings.
Models require improved retrieval strategies for open-domain temporal question answering.
Significant gaps in temporal reasoning capabilities of state-of-the-art LLMs are identified.
Abstract
Large Language Models (LLMs) have demonstrated immense advances in a wide range of natural language tasks. However, these models are susceptible to hallucinations and errors on particularly temporal understanding tasks involving multiple entities in answers. In such tasks, they fail to associate entities with accurate time intervals, generate a complete list of entities in answers or reason about events associated with specific temporal bounds. Existing works do not extensively evaluate the abilities of the model to perform implicit and explicit temporal understanding in a list answer construction setup. To bridge this gap, we propose the Time referenced List based Question Answering or TLQA benchmark that requires structured answers in list format aligned with corresponding time periods. Our TLQA benchmark, requires both list construction and temporal understanding simultaneously,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare
