ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large   Language Models via Transferable Adversarial Attacks

Xiaodong Yu; Hao Cheng; Xiaodong Liu; Dan Roth; Jianfeng Gao

arXiv:2310.12516·cs.CL·June 4, 2024·2 cites

ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks

Xiaodong Yu, Hao Cheng, Xiaodong Liu, Dan Roth, Jianfeng Gao

PDF

Open Access 1 Video

TL;DR

ReEval introduces an adversarial evaluation framework that automatically perturbs evidence to test large language models' reliability in using new information, revealing their vulnerability to hallucinations and transferability of adversarial examples.

Contribution

This paper presents ReEval, a novel LLM-based method for generating adversarial test cases through prompt chaining, enabling dynamic evaluation of models' evidence utilization and hallucination robustness.

Findings

01

Adversarial examples effectively trigger hallucinations in LLMs.

02

Models show significant accuracy drops on perturbed data.

03

Adversarial examples transfer across different LLMs, including GPT-4.

Abstract

Despite remarkable advancements in mitigating hallucinations in large language models (LLMs) by retrieval augmentation, it remains challenging to measure the reliability of LLMs using static question-answering (QA) data. Specifically, given the potential of data contamination (e.g., leading to memorization), good static benchmark performance does not ensure that model can reliably use the provided evidence for responding, which is essential to avoid hallucination when the required knowledge is new or private. Inspired by adversarial machine learning, we investigate the feasibility of automatically perturbing existing static one for dynamic evaluation. Specifically, this paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases for evaluating the LLMs' reliability in using new evidence for answering. We implement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks· underline

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Residual Connection · Absolute Position Encodings · Adam · Byte Pair Encoding