Temporally Consistent Factuality Probing for Large Language Models
Ashutosh Bajpai, Aaryan Goyal, Atif Anwer, Tanmoy Chakraborty

TL;DR
This paper introduces TeCFaP, a new benchmark for evaluating the temporal consistency of factual information in large language models, and proposes a novel training framework CoTSeLF to improve their temporal factuality.
Contribution
The study presents TeCFaP, a new dataset and metric extension for temporal factuality, and introduces CoTSeLF, a training method to enhance temporal consistency in LLMs.
Findings
Most LLMs perform poorly on TeCFaP.
CoTSeLF improves temporal factuality in LLMs.
Extended metrics effectively measure temporal consistency.
Abstract
The prolific use of Large Language Models (LLMs) as an alternate knowledge base requires them to be factually consistent, necessitating both correctness and consistency traits for paraphrased queries. Recently, significant attempts have been made to benchmark datasets and metrics to evaluate LLMs for these traits. However, structural simplicity (subject-relation-object) and contemporary association in their query formulation limit the broader definition of factuality and consistency. In this study, we introduce TeCFaP, a novel Temporally Consistent Factuality Probe task to expand the consistent factuality probe in the temporal dimension. To this end, we propose TEMP-COFAC, a high-quality dataset of prefix-style English query paraphrases. Subsequently, we extend the definitions of existing metrics to represent consistent factuality across temporal dimension. We experiment with a diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
MethodsSparse Evolutionary Training · Balanced Selection
