Logits of API-Protected LLMs Leak Proprietary Information
Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

TL;DR
This paper demonstrates that proprietary API-protected LLMs leak significant non-public information through their logits due to the softmax bottleneck, enabling various extraction and auditing capabilities with minimal cost.
Contribution
It reveals a novel vulnerability in API-protected LLMs caused by the softmax bottleneck, enabling information leakage and model analysis from limited API queries.
Findings
Estimated GPT-3.5-turbo embedding size is about 4096.
Able to recover full vocabulary outputs at low cost.
Can identify model updates and source LLMs effectively.
Abstract
Large language model (LLM) providers often hide the architectural details and parameters of their proprietary models by restricting public access to a limited API. In this work we show that, with only a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1000 USD for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We exploit this fact to unlock several capabilities, including (but not limited to) obtaining cheap full-vocabulary outputs, auditing for specific types of model updates, identifying the source LLM given a single full LLM output, and even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Digital Rights Management and Security · Business Process Modeling and Analysis
MethodsSoftmax
