RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge
Yi Liu, Lianzhe Huang, Shicheng Li, Sishuo Chen, Hao Zhou, Fandong, Meng, Jie Zhou, Xu Sun

TL;DR
This paper introduces RECALL, a benchmark to evaluate large language models' ability to distinguish reliable external knowledge from counterfactual information, revealing their susceptibility to misinformation and limited mitigation strategies.
Contribution
The paper presents a new benchmark with two tasks to assess LLMs' robustness against external counterfactual knowledge, highlighting their vulnerability and the need for improved methods.
Findings
LLMs are easily misled by unreliable external information.
Simple intervention methods have limited effectiveness.
Existing models struggle to discern factual from counterfactual knowledge.
Abstract
LLMs and AI chatbots have improved people's efficiency in various fields. However, the necessary knowledge for answering the question may be beyond the models' knowledge boundaries. To mitigate this issue, many researchers try to introduce external knowledge, such as knowledge graphs and Internet contents, into LLMs for up-to-date information. However, the external information from the Internet may include counterfactual information that will confuse the model and lead to an incorrect response. Thus there is a pressing need for LLMs to possess the ability to distinguish reliable information from external knowledge. Therefore, to evaluate the ability of LLMs to discern the reliability of external knowledge, we create a benchmark from existing knowledge bases. Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · AI in Service Interactions · Artificial Intelligence in Healthcare and Education
