Revealing Secrets From Pre-trained Models
Mujahid Al Rafi, Yuan Feng, Hyeran Jeon

TL;DR
This paper uncovers security vulnerabilities in transfer-learned models like BERT by demonstrating high weight similarities between pre-trained and fine-tuned models, enabling model extraction attacks.
Contribution
It introduces a novel model extraction attack exploiting weight similarities and vendor-specific patterns to reveal model architecture and weights.
Findings
Pre-trained and fine-tuned models have high weight similarity.
Vendor-specific computing patterns exist even for identical models.
The proposed attack can accurately extract model architecture and weights.
Abstract
With the growing burden of training deep learning models with large data sets, transfer-learning has been widely adopted in many emerging deep learning algorithms. Transformer models such as BERT are the main player in natural language processing and use transfer-learning as a de facto standard training method. A few big data companies release pre-trained models that are trained with a few popular datasets with which end users and researchers fine-tune the model with their own datasets. Transfer-learning significantly reduces the time and effort of training models. However, it comes at the cost of security concerns. In this paper, we show a new observation that pre-trained models and fine-tuned models have significantly high similarities in weight values. Also, we demonstrate that there exist vendor-specific computing patterns even for the same models. With these new findings, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Warmup With Linear Decay · Residual Connection · Attention Dropout
