Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings

Doohyun Kim; Donghwa Kang; Kyungjae Lee; Hyeongboo Baek; Brent Byunghoon Kang

arXiv:2602.01757·cs.CL·February 4, 2026

Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings

Doohyun Kim, Donghwa Kang, Kyungjae Lee, Hyeongboo Baek, Brent Byunghoon Kang

PDF

Open Access

TL;DR

Zero2Text introduces a training-free, recursive alignment method that effectively performs cross-domain embedding inversion attacks on textual models, surpassing existing approaches in black-box scenarios.

Contribution

It presents Zero2Text, a novel framework combining LLM priors with online alignment, enabling effective embedding inversion without training data or extensive queries.

Findings

01

Achieves 1.8x higher ROUGE-L on MS MARCO

02

Attains 6.4x higher BLEU-2 scores compared to baselines

03

Successfully recovers sentences from unseen domains

Abstract

The proliferation of retrieval-augmented generation (RAG) has established vector databases as critical infrastructure, yet they introduce severe privacy risks via embedding inversion attacks. Existing paradigms face a fundamental trade-off: optimization-based methods require computationally prohibitive queries, while alignment-based approaches hinge on the unrealistic assumption of accessible in-domain training data. These constraints render them ineffective in strict black-box and cross-domain settings. To dismantle these barriers, we introduce Zero2Text, a novel training-free framework based on recursive online alignment. Unlike methods relying on static datasets, Zero2Text synergizes LLM priors with a dynamic ridge regression mechanism to iteratively align generation to the target embedding on-the-fly. We further demonstrate that standard defenses, such as differential privacy, fail…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Privacy-Preserving Technologies in Data