"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation

Zi Liang; Qingqing Ye; Yanyun Wang; Sen Zhang; Yaxin Xiao; Ronghua Li; Jianliang Xu; Haibo Hu

arXiv:2409.02718·cs.CR·May 20, 2025

"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation

Zi Liang, Qingqing Ye, Yanyun Wang, Sen Zhang, Yaxin Xiao, Ronghua Li, Jianliang Xu, Haibo Hu

PDF

Open Access 2 Repos

TL;DR

LoRD introduces a novel, locality-reinforced distillation approach tailored for extracting large language models, improving efficiency and effectiveness over existing methods by aligning extraction with LLM training tasks.

Contribution

The paper presents LoRD, a new model extraction algorithm for LLMs that uses policy-gradient training and theoretical analysis to enhance extraction performance and reduce query complexity.

Findings

01

LoRD outperforms existing extraction methods on commercial LLMs.

02

LoRD reduces query complexity and mitigates watermark protection.

03

Theoretical analysis confirms convergence and alignment with LLM training procedures.

Abstract

Model extraction attacks (MEAs) on large language models (LLMs) have received increasing attention in recent research. However, existing attack methods typically adapt the extraction strategies originally developed for deep neural networks (DNNs). They neglect the underlying inconsistency between the training tasks of MEA and LLM alignment, leading to suboptimal attack performance. To tackle this issue, we propose Locality Reinforced Distillation (LoRD), a novel model extraction algorithm specifically designed for LLMs. In particular, LoRD employs a newly defined policy-gradient-style training task that utilizes the responses of victim model as the signal to guide the crafting of preference for the local model. Theoretical analyses demonstrate that I) The convergence procedure of LoRD in model extraction is consistent with the alignment procedure of LLMs, and II) LoRD can reduce query…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsSoftmax · Attention Is All You Need