On Adversarial Robustness and Out-of-Distribution Robustness of Large   Language Models

April Yang; Jordan Tab; Parth Shah; and Paul Kotchavong

arXiv:2412.10535·cs.CL·December 17, 2024

On Adversarial Robustness and Out-of-Distribution Robustness of Large Language Models

April Yang, Jordan Tab, Parth Shah, and Paul Kotchavong

PDF

Open Access 1 Repo

TL;DR

This paper explores the relationship between adversarial and out-of-distribution robustness in large language models, revealing nuanced interactions and model-specific trends that inform more reliable robustness strategies.

Contribution

It provides the first comprehensive analysis of the correlation between adversarial and OOD robustness in LLMs, highlighting the limited transferability and model-dependent interactions.

Findings

01

Limited transferability between robustness types.

02

Model size and architecture influence robustness correlation.

03

Hybrid robustness frameworks are needed for better generalization.

Abstract

The increasing reliance on large language models (LLMs) for diverse applications necessitates a thorough understanding of their robustness to adversarial perturbations and out-of-distribution (OOD) inputs. In this study, we investigate the correlation between adversarial robustness and OOD robustness in LLMs, addressing a critical gap in robustness evaluation. By applying methods originally designed to improve one robustness type across both contexts, we analyze their performance on adversarial and out-of-distribution benchmark datasets. The input of the model consists of text samples, with the output prediction evaluated in terms of accuracy, precision, recall, and F1 scores in various natural language inference tasks. Our findings highlight nuanced interactions between adversarial robustness and OOD robustness, with results indicating limited transferability between the two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jordantab/llm-robustness-experiment
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling