Language Model Inversion through End-to-End Differentiation
Kevin Yandoka Denamgana\"i, Kartic Subr

TL;DR
This paper introduces a gradient-based method to invert language models by optimizing prompts to produce desired outputs, viewing models as functions on token distributions, enabling efficient prompt optimization.
Contribution
It presents a novel end-to-end differentiable framework for language model inversion, allowing prompt optimization via gradient descent on frozen models.
Findings
Effective prompt optimization for target outputs of length 20
Works reliably on several white-box language models
Handles prompt lengths up to 80 tokens
Abstract
Despite emerging research on Language Models (LM), few approaches analyse the invertibility of LMs. That is, given a LM and a desirable target output sequence of tokens, determining what input prompts would yield the target output remains an open problem. We formulate this problem as a classical gradient-based optimisation. First, we propose a simple algorithm to achieve end-to-end differentiability of a given (frozen) LM and then find optimised prompts via gradient descent. Our central insight is to view LMs as functions operating on sequences of distributions over tokens (rather than the traditional view as functions on sequences of tokens). Our experiments and ablations demonstrate that our DLM-powered inversion can reliably and efficiently optimise prompts of lengths and for targets of length , for several white-box LMs (out-of-the-box).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis
