ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks
Xiaodong Yu, Hao Cheng, Xiaodong Liu, Dan Roth, Jianfeng Gao

TL;DR
ReEval introduces an adversarial evaluation framework that automatically perturbs evidence to test large language models' reliability in using new information, revealing their vulnerability to hallucinations and transferability of adversarial examples.
Contribution
This paper presents ReEval, a novel LLM-based method for generating adversarial test cases through prompt chaining, enabling dynamic evaluation of models' evidence utilization and hallucination robustness.
Findings
Adversarial examples effectively trigger hallucinations in LLMs.
Models show significant accuracy drops on perturbed data.
Adversarial examples transfer across different LLMs, including GPT-4.
Abstract
Despite remarkable advancements in mitigating hallucinations in large language models (LLMs) by retrieval augmentation, it remains challenging to measure the reliability of LLMs using static question-answering (QA) data. Specifically, given the potential of data contamination (e.g., leading to memorization), good static benchmark performance does not ensure that model can reliably use the provided evidence for responding, which is essential to avoid hallucination when the required knowledge is new or private. Inspired by adversarial machine learning, we investigate the feasibility of automatically perturbing existing static one for dynamic evaluation. Specifically, this paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases for evaluating the LLMs' reliability in using new evidence for answering. We implement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Residual Connection · Absolute Position Encodings · Adam · Byte Pair Encoding
