Assessing Adversarial Robustness of Large Language Models: An Empirical   Study

Zeyu Yang; Zhao Meng; Xiaochen Zheng; Roger Wattenhofer

arXiv:2405.02764·cs.CL·September 16, 2024·2 cites

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Zeyu Yang, Zhao Meng, Xiaochen Zheng, Roger Wattenhofer

PDF

Open Access

TL;DR

This paper empirically evaluates the adversarial robustness of large language models like Llama, OPT, and T5, revealing vulnerabilities and establishing a new benchmark for their resilience across multiple tasks.

Contribution

It introduces a novel white-box attack method and provides a comprehensive assessment of factors affecting LLM robustness, advancing trustworthy AI development.

Findings

01

Identifies vulnerabilities in open-source LLMs

02

Shows impact of model size and fine-tuning on robustness

03

Establishes a new benchmark for LLM adversarial resilience

Abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversarial perturbations. Our comprehensive evaluation across five diverse text classification tasks establishes a new benchmark for LLM robustness. The findings of this study have far-reaching implications for the reliable deployment of LLMs in real-world applications and contribute to the advancement of trustworthy AI systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dense Connections · Adafactor · Dropout · Gated Linear Unit · Attention Dropout · Residual Connection · Softmax · Byte Pair Encoding