On Adversarial Robustness of Language Models in Transfer Learning
Bohdan Turbal, Anastasiia Mazur, Jiaxu Zhao, Mykola Pechenizkiy

TL;DR
This paper examines how transfer learning affects the adversarial robustness of large language models, revealing that larger models tend to be more resilient but that transfer learning can increase vulnerability to attacks.
Contribution
It provides a comprehensive experimental analysis of the robustness of various language models in transfer learning, highlighting the impact of model size and architecture.
Findings
Larger models show greater robustness to adversarial attacks.
Transfer learning can increase vulnerability despite improving standard metrics.
Model architecture influences robustness in transfer learning scenarios.
Abstract
We investigate the adversarial robustness of LLMs in transfer learning scenarios. Through comprehensive experiments on multiple datasets (MBIB Hate Speech, MBIB Political Bias, MBIB Gender Bias) and various model architectures (BERT, RoBERTa, GPT-2, Gemma, Phi), we reveal that transfer learning, while improving standard performance metrics, often leads to increased vulnerability to adversarial attacks. Our findings demonstrate that larger models exhibit greater resilience to this phenomenon, suggesting a complex interplay between model size, architecture, and adaptation methods. Our work highlights the crucial need for considering adversarial robustness in transfer learning scenarios and provides insights into maintaining model security without compromising performance. These findings have significant implications for the development and deployment of LLMs in real-world applications…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare · Adversarial Robustness in Machine Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · BERT · Linear Layer · Softmax · Dense Connections · Dropout · Cosine Annealing
