Evaluating and Safeguarding the Adversarial Robustness of   Retrieval-Based In-Context Learning

Simon Yu; Jie He; Pasquale Minervini; Jeff Z. Pan

arXiv:2405.15984·cs.CL·October 10, 2024

Evaluating and Safeguarding the Adversarial Robustness of Retrieval-Based In-Context Learning

Simon Yu, Jie He, Pasquale Minervini, Jeff Z. Pan

PDF

Open Access 1 Repo

TL;DR

This paper investigates the adversarial robustness of retrieval-augmented in-context learning (ICL) with large language models, proposing a training-free defense method called DARD that improves robustness against attacks.

Contribution

It introduces DARD, a training-free adversarial defense method for retrieval-augmented ICL, and provides a comprehensive analysis of robustness against various adversarial attacks.

Findings

01

Retrieval-augmented ICL improves robustness against test sample attacks.

02

Overconfidence in demonstrations increases vulnerability to demonstration attacks.

03

DARD reduces attack success rate by 15% compared to baselines.

Abstract

With the emergence of large language models, such as LLaMA and OpenAI GPT-3, In-Context Learning (ICL) gained significant attention due to its effectiveness and efficiency. However, ICL is very sensitive to the choice, order, and verbaliser used to encode the demonstrations in the prompt. Retrieval-Augmented ICL methods try to address this problem by leveraging retrievers to extract semantically related examples as demonstrations. While this approach yields more accurate results, its robustness against various types of adversarial attacks, including perturbations on test samples, demonstrations, and retrieved data, remains under-explored. Our study reveals that retrieval-augmented models can enhance robustness against test sample attacks, outperforming vanilla ICL with a 4.87% reduction in Attack Success Rate (ASR); however, they exhibit overconfidence in the demonstrations, leading to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

simonucl/adv-retreival-icl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Adam · Dropout · Dense Connections · Softmax · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Cosine Annealing