Let LLMs Take on the Latest Challenges! A Chinese Dynamic Question Answering Benchmark
Zhikun Xu, Yinghui Li, Ruixue Ding, Xinyu Wang, Boli Chen, Yong Jiang,, Hai-Tao Zheng, Wenlian Lu, Pengjun Xie, Fei Huang

TL;DR
This paper introduces CDQA, a Chinese dynamic question-answering benchmark based on recent news, to evaluate and improve LLMs' ability to handle evolving information in Chinese language tasks.
Contribution
The paper presents a new benchmark for Chinese dynamic QA, with a high-quality, human-model combined data pipeline and detailed classification of answer change frequency.
Findings
Current Chinese LLMs struggle with dynamic questions
CDQA is challenging and highlights areas for improvement
Benchmark facilitates future research in Chinese LLM capabilities
Abstract
How to better evaluate the capabilities of Large Language Models (LLMs) is the focal point and hot topic in current LLMs research. Previous work has noted that due to the extremely high cost of iterative updates of LLMs, they are often unable to answer the latest dynamic questions well. To promote the improvement of Chinese LLMs' ability to answer dynamic questions, in this paper, we introduce CDQA, a Chinese Dynamic QA benchmark containing question-answer pairs related to the latest news on the Chinese Internet. We obtain high-quality data through a pipeline that combines humans and models, and carefully classify the samples according to the frequency of answer changes to facilitate a more fine-grained observation of LLMs' capabilities. We have also evaluated and analyzed mainstream and advanced Chinese LLMs on CDQA. Extensive experiments and valuable insights suggest that our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
