Evaluating and Safeguarding the Adversarial Robustness of Retrieval-Based In-Context Learning
Simon Yu, Jie He, Pasquale Minervini, Jeff Z. Pan

TL;DR
This paper investigates the adversarial robustness of retrieval-augmented in-context learning (ICL) with large language models, proposing a training-free defense method called DARD that improves robustness against attacks.
Contribution
It introduces DARD, a training-free adversarial defense method for retrieval-augmented ICL, and provides a comprehensive analysis of robustness against various adversarial attacks.
Findings
Retrieval-augmented ICL improves robustness against test sample attacks.
Overconfidence in demonstrations increases vulnerability to demonstration attacks.
DARD reduces attack success rate by 15% compared to baselines.
Abstract
With the emergence of large language models, such as LLaMA and OpenAI GPT-3, In-Context Learning (ICL) gained significant attention due to its effectiveness and efficiency. However, ICL is very sensitive to the choice, order, and verbaliser used to encode the demonstrations in the prompt. Retrieval-Augmented ICL methods try to address this problem by leveraging retrievers to extract semantically related examples as demonstrations. While this approach yields more accurate results, its robustness against various types of adversarial attacks, including perturbations on test samples, demonstrations, and retrieved data, remains under-explored. Our study reveals that retrieval-augmented models can enhance robustness against test sample attacks, outperforming vanilla ICL with a 4.87% reduction in Attack Success Rate (ASR); however, they exhibit overconfidence in the demonstrations, leading to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Adam · Dropout · Dense Connections · Softmax · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Cosine Annealing
