Adversarial Robustness of Open-source Text Classification Models and   Fine-Tuning Chains

Hao Qin; Mingyang Li; Junjie Wang; Qing Wang

arXiv:2408.02963·cs.SE·August 7, 2024

Adversarial Robustness of Open-source Text Classification Models and Fine-Tuning Chains

Hao Qin, Mingyang Li, Junjie Wang, Qing Wang

PDF

Open Access

TL;DR

This study investigates the adversarial robustness of open-source text classification models and their fine-tuning chains, revealing significant vulnerability to attacks and the exacerbating effect of fine-tuning, with insights into influencing factors.

Contribution

It provides the first empirical analysis of adversarial risks in open-source models and their fine-tuning chains, highlighting vulnerabilities and influencing factors.

Findings

01

Models have an average 52.70% attack success rate.

02

Fine-tuning increases attack success rates by 12.60%.

03

Factors like attack methods, datasets, and architectures affect robustness.

Abstract

Context:With the advancement of artificial intelligence (AI) technology and applications, numerous AI models have been developed, leading to the emergence of open-source model hosting platforms like Hugging Face (HF). Thanks to these platforms, individuals can directly download and use models, as well as fine-tune them to construct more domain-specific models. However, just like traditional software supply chains face security risks, AI models and fine-tuning chains also encounter new security risks, such as adversarial attacks. Therefore, the adversarial robustness of these models has garnered attention, potentially influencing people's choices regarding open-source models. Objective:This paper aims to explore the adversarial robustness of open-source AI models and their chains formed by the upstream-downstream relationships via fine-tuning to provide insights into the potential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning