TL;DR
This paper challenges the reliance on indistinguishability as a proxy for privacy in LLM APIs, introducing a new inextractability measure that better captures extraction risks and providing practical evaluation methods.
Contribution
It formalizes a new inextractability definition, develops an extraction risk estimator, and empirically evaluates extraction risks across models and configurations.
Findings
Indistinguishability does not imply inextractability in LLMs.
The proposed estimator effectively measures extraction risk.
Empirical results clarify the connection between extractability and distinguishability.
Abstract
Indistinguishability properties such as differential privacy bounds or low empirically measured membership inference are widely treated as proxies to show a model is sufficiently protected against broader memorization risks. However, we show that indistinguishability properties are neither sufficient nor necessary for preventing data extraction in LLM APIs. We formalize a privacy-game separation between extraction and indistinguishability-based privacy, showing that indistinguishability and inextractability are incomparable: upper-bounding distinguishability does not upper-bound extractability. To address this gap, we introduce -inextractability as a definition that requires at least expected queries for any black-box adversary to induce the LLM API to emit a protected -gram substring. We instantiate this via a worst-case extraction game and derive a rank-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
