Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
Zhibo Hu, Chen Wang, Yanfeng Shu, Helen (Hye-Young) Paik, Liming Zhu

TL;DR
This paper investigates how small prompt modifications can drastically alter the outputs of retrieval-augmented LLMs, introducing methods to both manipulate and detect such perturbations to improve robustness.
Contribution
It introduces Gradient Guided Prompt Perturbation (GGPP) for steering RAG-based LLM outputs and a neuron activation-based detector to enhance model robustness against prompt perturbations.
Findings
GGPP successfully directs RAG outputs to targeted wrong answers
The detector effectively identifies prompts with GGPP perturbations
Methods improve the robustness and trustworthiness of RAG-based LLMs
Abstract
The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers. We systematically evaluate the effect of such prefixes on RAG by introducing a novel optimization technique called Gradient Guided Prompt Perturbation (GGPP). GGPP achieves a high success rate in steering outputs of RAG-based LLMs to targeted wrong answers. It can also cope with instructions in the prompts requesting to ignore irrelevant context. We also exploit LLMs' neuron…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Recommender Systems and Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Attention Dropout · WordPiece · Dense Connections · Softmax · Weight Decay · Byte Pair Encoding · Linear Warmup With Linear Decay · BERT
