PromptRobust: Towards Evaluating the Robustness of Large Language Models   on Adversarial Prompts

Kaijie Zhu; Jindong Wang; Jiaheng Zhou; Zichen Wang; Hao Chen; Yidong; Wang; Linyi Yang; Wei Ye; Yue Zhang; Neil Zhenqiang Gong; Xing Xie

arXiv:2306.04528·cs.CL·July 17, 2024·50 cites

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong, Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie

PDF

Open Access 1 Repo

TL;DR

This paper introduces PromptRobust, a benchmark for evaluating the robustness of large language models against adversarial prompts across multiple tasks, revealing their vulnerability and providing insights for improving prompt design.

Contribution

The study presents a new robustness benchmark, PromptRobust, with a large set of adversarial prompts and analysis of LLM vulnerabilities across diverse tasks and datasets.

Findings

01

LLMs are vulnerable to adversarial prompts

02

Adversarial prompts significantly impact LLM performance

03

Transferability of adversarial prompts varies across models

Abstract

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts, crafted to mimic plausible user errors like typos or synonyms, aim to evaluate how slight deviations can affect LLM outcomes while maintaining semantic integrity. These prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,788 adversarial prompts, meticulously evaluated over 8 tasks and 13…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/promptbench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsTest