Embedding Inversion via Conditional Masked Diffusion Language Models
Han Xiao

TL;DR
This paper introduces a novel method for embedding inversion using conditional masked diffusion, enabling efficient parallel token recovery without encoder access or iterative correction, significantly improving over traditional autoregressive approaches.
Contribution
The authors propose a new conditional masked diffusion approach for embedding inversion that recovers all tokens simultaneously with minimal inference passes and no encoder access.
Findings
Achieves token recovery with only 8 forward passes.
Operates without encoder access or iterative correction.
Effective across multiple embedding models on 32-token sequences.
Abstract
We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes with no access to the target encoder at inference time. On 32-token sequences across three embedding models, the method achieves token recovery through parallel denoising without requiring encoder access, iterative correction, or architecture-specific alignment. Source code and live demo are available at https://github.com/jina-ai/embedding-inversion-demo.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis · Topic Modeling
