RECALL: A Benchmark for LLMs Robustness against External Counterfactual   Knowledge

Yi Liu; Lianzhe Huang; Shicheng Li; Sishuo Chen; Hao Zhou; Fandong; Meng; Jie Zhou; Xu Sun

arXiv:2311.08147·cs.CL·November 15, 2023·6 cites

RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge

Yi Liu, Lianzhe Huang, Shicheng Li, Sishuo Chen, Hao Zhou, Fandong, Meng, Jie Zhou, Xu Sun

PDF

Open Access

TL;DR

This paper introduces RECALL, a benchmark to evaluate large language models' ability to distinguish reliable external knowledge from counterfactual information, revealing their susceptibility to misinformation and limited mitigation strategies.

Contribution

The paper presents a new benchmark with two tasks to assess LLMs' robustness against external counterfactual knowledge, highlighting their vulnerability and the need for improved methods.

Findings

01

LLMs are easily misled by unreliable external information.

02

Simple intervention methods have limited effectiveness.

03

Existing models struggle to discern factual from counterfactual knowledge.

Abstract

LLMs and AI chatbots have improved people's efficiency in various fields. However, the necessary knowledge for answering the question may be beyond the models' knowledge boundaries. To mitigate this issue, many researchers try to introduce external knowledge, such as knowledge graphs and Internet contents, into LLMs for up-to-date information. However, the external information from the Internet may include counterfactual information that will confuse the model and lead to an incorrect response. Thus there is a pressing need for LLMs to possess the ability to distinguish reliable information from external knowledge. Therefore, to evaluate the ability of LLMs to discern the reliability of external knowledge, we create a benchmark from existing knowledge bases. Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · AI in Service Interactions · Artificial Intelligence in Healthcare and Education