On Adversarial Robustness of Language Models in Transfer Learning

Bohdan Turbal; Anastasiia Mazur; Jiaxu Zhao; Mykola Pechenizkiy

arXiv:2501.00066·cs.CL·June 10, 2025

On Adversarial Robustness of Language Models in Transfer Learning

Bohdan Turbal, Anastasiia Mazur, Jiaxu Zhao, Mykola Pechenizkiy

PDF

Open Access

TL;DR

This paper examines how transfer learning affects the adversarial robustness of large language models, revealing that larger models tend to be more resilient but that transfer learning can increase vulnerability to attacks.

Contribution

It provides a comprehensive experimental analysis of the robustness of various language models in transfer learning, highlighting the impact of model size and architecture.

Findings

01

Larger models show greater robustness to adversarial attacks.

02

Transfer learning can increase vulnerability despite improving standard metrics.

03

Model architecture influences robustness in transfer learning scenarios.

Abstract

We investigate the adversarial robustness of LLMs in transfer learning scenarios. Through comprehensive experiments on multiple datasets (MBIB Hate Speech, MBIB Political Bias, MBIB Gender Bias) and various model architectures (BERT, RoBERTa, GPT-2, Gemma, Phi), we reveal that transfer learning, while improving standard performance metrics, often leads to increased vulnerability to adversarial attacks. Our findings demonstrate that larger models exhibit greater resilience to this phenomenon, suggesting a complex interplay between model size, architecture, and adaptation methods. Our work highlights the crucial need for considering adversarial robustness in transfer learning scenarios and provides insights into maintaining model security without compromising performance. These findings have significant implications for the development and deployment of LLMs in real-world applications…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterpreting and Communication in Healthcare · Adversarial Robustness in Machine Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · BERT · Linear Layer · Softmax · Dense Connections · Dropout · Cosine Annealing