Adversarial Robustness of Open-source Text Classification Models and Fine-Tuning Chains
Hao Qin, Mingyang Li, Junjie Wang, Qing Wang

TL;DR
This study investigates the adversarial robustness of open-source text classification models and their fine-tuning chains, revealing significant vulnerability to attacks and the exacerbating effect of fine-tuning, with insights into influencing factors.
Contribution
It provides the first empirical analysis of adversarial risks in open-source models and their fine-tuning chains, highlighting vulnerabilities and influencing factors.
Findings
Models have an average 52.70% attack success rate.
Fine-tuning increases attack success rates by 12.60%.
Factors like attack methods, datasets, and architectures affect robustness.
Abstract
Context:With the advancement of artificial intelligence (AI) technology and applications, numerous AI models have been developed, leading to the emergence of open-source model hosting platforms like Hugging Face (HF). Thanks to these platforms, individuals can directly download and use models, as well as fine-tune them to construct more domain-specific models. However, just like traditional software supply chains face security risks, AI models and fine-tuning chains also encounter new security risks, such as adversarial attacks. Therefore, the adversarial robustness of these models has garnered attention, potentially influencing people's choices regarding open-source models. Objective:This paper aims to explore the adversarial robustness of open-source AI models and their chains formed by the upstream-downstream relationships via fine-tuning to provide insights into the potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
