Text Embedding Inversion Security for Multilingual Language Models

Yiyi Chen; Heather Lent; Johannes Bjerva

arXiv:2401.12192·cs.CL·June 6, 2024·1 cites

Text Embedding Inversion Security for Multilingual Language Models

Yiyi Chen, Heather Lent, Johannes Bjerva

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the security risks of text embeddings in multilingual language models, revealing vulnerabilities to inversion attacks and proposing a masking defense to enhance privacy across languages.

Contribution

It introduces the first study of multilingual inversion attacks and evaluates a simple masking defense effective for both monolingual and multilingual models.

Findings

01

Multilingual LLMs are more vulnerable to inversion attacks.

02

English-based defenses may be ineffective for other languages.

03

A simple masking defense improves security in multilingual settings.

Abstract

Textual data is often represented as real-numbered embeddings in NLP, particularly with the popularity of large language models (LLMs) and Embeddings as a Service (EaaS). However, storing sensitive information as embeddings can be susceptible to security breaches, as research shows that text can be reconstructed from embeddings, even without knowledge of the underlying model. While defence mechanisms have been explored, these are exclusively focused on English, leaving other languages potentially exposed to attacks. This work explores LLM security through multilingual embedding inversion. We define the problem of black-box multilingual and cross-lingual inversion attacks, and explore their potential implications. Our findings suggest that multilingual LLMs may be more vulnerable to inversion attacks, in part because English-based defences may be ineffective. To alleviate this, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

siebeniris/multivec2text
pytorchOfficial

Videos

Text Embedding Inversion Security for Multilingual Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning

Methodstravel james · Focus