Better Language Model Inversion by Compactly Representing Next-Token Distributions

Murtaza Nazir; Matthew Finlayson; John X. Morris; Xiang Ren; Swabha Swayamdipta

arXiv:2506.17090·cs.CL·December 12, 2025

Better Language Model Inversion by Compactly Representing Next-Token Distributions

Murtaza Nazir, Matthew Finlayson, John X. Morris, Xiang Ren, Swabha Swayamdipta

PDF

Open Access 1 Models

TL;DR

This paper introduces PILS, a novel method for language model inversion that leverages low-dimensional subspace representations of next-token probabilities to significantly improve prompt recovery accuracy and generalization, raising security concerns.

Contribution

We propose PILS, a linear compression-based approach that enhances prompt inversion by exploiting low-dimensional structures in language model outputs, outperforming prior methods.

Findings

01

Achieves 2-3.5x higher prompt recovery rates than previous methods.

02

Demonstrates strong generalization to increased generation steps.

03

Effectively recovers hidden system messages, highlighting security vulnerabilities.

Abstract

Language model inversion seeks to recover hidden prompts using only language model outputs. This capability has implications for security and accountability in language model deployments, such as leaking private information from an API-protected language model's system message. We propose a new method -- prompt inversion from logprob sequences (PILS) -- that recovers hidden prompts by gleaning clues from the model's next-token probabilities over the course of multiple generation steps. Our method is enabled by a key insight: The vector-valued outputs of a language model occupy a low-dimensional subspace. This enables us to losslessly compress the full next-token probability distribution over multiple generation steps using a linear map, allowing more output information to be used for inversion. Our approach yields massive gains over previous state-of-the-art methods for recovering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
dill-lab/pils-32-llama2-chat-7b
model· 80 dl
80 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Security and Verification in Computing