Model Stealing for Any Low-Rank Language Model
Allen Liu, Ankur Moitra

TL;DR
This paper presents an efficient algorithm for stealing low-rank language models, including Hidden Markov Models, using a theoretical framework that improves upon previous methods by removing fidelity restrictions.
Contribution
The paper introduces a novel algorithm for learning any low-rank distribution, advancing the theoretical understanding of model stealing for language models.
Findings
Successfully learns low-rank distributions with an efficient algorithm
Improves upon previous results by removing fidelity constraints
Uses convex optimization and barycentric spanners for model representation
Abstract
Model stealing, where a learner tries to recover an unknown model via carefully chosen queries, is a critical problem in machine learning, as it threatens the security of proprietary models and the privacy of data they are trained on. In recent years, there has been particular interest in stealing large language models (LLMs). In this paper, we aim to build a theoretical understanding of stealing language models by studying a simple and mathematically tractable setting. We study model stealing for Hidden Markov Models (HMMs), and more generally low-rank language models. We assume that the learner works in the conditional query model, introduced by Kakade, Krishnamurthy, Mahajan and Zhang. Our main result is an efficient algorithm in the conditional query model, for learning any low-rank distribution. In other words, our algorithm succeeds at stealing any language model whose output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
