Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

Zhibo Hu; Chen Wang; Yanfeng Shu; Helen (Hye-Young) Paik; Liming Zhu

arXiv:2402.07179·cs.CL·May 22, 2025·1 cites

Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

Zhibo Hu, Chen Wang, Yanfeng Shu, Helen (Hye-Young) Paik, Liming Zhu

PDF

Open Access

TL;DR

This paper investigates how small prompt modifications can drastically alter the outputs of retrieval-augmented LLMs, introducing methods to both manipulate and detect such perturbations to improve robustness.

Contribution

It introduces Gradient Guided Prompt Perturbation (GGPP) for steering RAG-based LLM outputs and a neuron activation-based detector to enhance model robustness against prompt perturbations.

Findings

01

GGPP successfully directs RAG outputs to targeted wrong answers

02

The detector effectively identifies prompts with GGPP perturbations

03

Methods improve the robustness and trustworthiness of RAG-based LLMs

Abstract

The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers. We systematically evaluate the effect of such prefixes on RAG by introducing a novel optimization technique called Gradient Guided Prompt Perturbation (GGPP). GGPP achieves a high success rate in steering outputs of RAG-based LLMs to targeted wrong answers. It can also cope with instructions in the prompts requesting to ignore irrelevant context. We also exploit LLMs' neuron…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Recommender Systems and Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Attention Dropout · WordPiece · Dense Connections · Softmax · Weight Decay · Byte Pair Encoding · Linear Warmup With Linear Decay · BERT