On Adversarial Robustness and Out-of-Distribution Robustness of Large Language Models
April Yang, Jordan Tab, Parth Shah, and Paul Kotchavong

TL;DR
This paper explores the relationship between adversarial and out-of-distribution robustness in large language models, revealing nuanced interactions and model-specific trends that inform more reliable robustness strategies.
Contribution
It provides the first comprehensive analysis of the correlation between adversarial and OOD robustness in LLMs, highlighting the limited transferability and model-dependent interactions.
Findings
Limited transferability between robustness types.
Model size and architecture influence robustness correlation.
Hybrid robustness frameworks are needed for better generalization.
Abstract
The increasing reliance on large language models (LLMs) for diverse applications necessitates a thorough understanding of their robustness to adversarial perturbations and out-of-distribution (OOD) inputs. In this study, we investigate the correlation between adversarial robustness and OOD robustness in LLMs, addressing a critical gap in robustness evaluation. By applying methods originally designed to improve one robustness type across both contexts, we analyze their performance on adversarial and out-of-distribution benchmark datasets. The input of the model consists of text samples, with the output prediction evaluated in terms of accuracy, precision, recall, and F1 scores in various natural language inference tasks. Our findings highlight nuanced interactions between adversarial robustness and OOD robustness, with results indicating limited transferability between the two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
