A Dataset for Answering Time-Sensitive Questions
Wenhu Chen, Xinyi Wang, William Yang Wang

TL;DR
This paper introduces a new dataset for evaluating QA models' ability to understand and reason over time-sensitive facts, highlighting current models' limitations in temporal reasoning.
Contribution
The paper constructs a novel time-sensitive QA dataset from WikiData and Wikipedia, providing a benchmark for temporal reasoning in NLP models.
Findings
State-of-the-art models perform poorly on the dataset, with FiD achieving only 46% accuracy.
The dataset reveals significant gaps in models' temporal understanding and reasoning.
It serves as a benchmark to improve NLP models' sensitivity to temporal shifts.
Abstract
Time is an important dimension in our physical world. Lots of facts can evolve with respect to time. For example, the U.S. President might change every four years. Therefore, it is important to consider the time dimension and empower the existing QA models to reason over time. However, the existing QA datasets contain rather few time-sensitive questions, hence not suitable for diagnosing or benchmarking the model's temporal reasoning capability. In order to promote research in this direction, we propose to construct a time-sensitive QA dataset. The dataset is constructed by 1) mining time-evolving facts from WikiData and aligning them to their corresponding Wikipedia page, 2) employing crowd workers to verify and calibrate these noisy facts, 3) generating question-answer pairs based on the annotated time-sensitive facts. Our dataset poses challenges in the aspect of both temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsBigBird
