A Survey on Model Extraction Attacks and Defenses for Large Language Models
Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong

TL;DR
This survey comprehensively reviews model extraction attacks and defenses for large language models, categorizing methods, analyzing effectiveness, and proposing new evaluation metrics and research directions.
Contribution
It provides a detailed taxonomy of LLM-specific attacks and defenses, introduces specialized metrics, and highlights key limitations and future research avenues.
Findings
API-based knowledge distillation is effective for extraction.
Current defenses have significant limitations in real-world scenarios.
Adaptive, integrated defense mechanisms show promise.
Abstract
Model extraction attacks pose significant security threats to deployed language models, potentially compromising intellectual property and user privacy. This survey provides a comprehensive taxonomy of LLM-specific extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks. We analyze various attack methodologies including API-based knowledge distillation, direct querying, parameter recovery, and prompt stealing techniques that exploit transformer architectures. We then examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios. We propose specialized metrics for evaluating both attack effectiveness and defense performance, addressing the specific challenges of generative language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Web Application Security Vulnerabilities
