Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!
Xuanli He, Lingjuan Lyu, Qiongkai Xu, Lichao Sun

TL;DR
This paper demonstrates how BERT-based NLP APIs are vulnerable to model extraction attacks and how the extracted models can be used to generate transferable adversarial examples, highlighting security concerns.
Contribution
It introduces methods for extracting BERT models with limited queries and shows the transferability of adversarial attacks, revealing security vulnerabilities in deployed NLP models.
Findings
Adversaries can successfully steal BERT-based APIs with limited knowledge.
Extracted models enable highly transferable adversarial attacks.
Defense strategies can mitigate risks without sacrificing model performance.
Abstract
Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pre-trained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models for downstream tasks. However, when a fine-tuned BERT model is deployed as a service, it may suffer from different attacks launched by malicious users. In this work, we first present how an adversary can steal a BERT-based API service (the victim/target model) on multiple benchmark datasets with limited prior knowledge and queries. We further show that the extracted model can lead to highly transferable adversarial attacks against the victim model. Our studies indicate that the potential vulnerabilities of BERT-based API services still hold, even when there is an architectural mismatch between the victim model and the attack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Network Security and Intrusion Detection
Methodstravel james · Linear Layer · Adam · Attention Is All You Need · Attention Dropout · Layer Normalization · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia?
