Understanding Privacy Risks of Embeddings Induced by Large Language   Models

Zhihao Zhu; Ninglu Shao; Defu Lian; Chenwang Wu; Zheng Liu; Yi Yang,; Enhong Chen

arXiv:2404.16587·cs.CL·April 26, 2024·1 cites

Understanding Privacy Risks of Embeddings Induced by Large Language Models

Zhihao Zhu, Ninglu Shao, Defu Lian, Chenwang Wu, Zheng Liu, Yi Yang,, Enhong Chen

PDF

Open Access

TL;DR

This paper investigates the privacy risks of using embeddings from large language models, showing that LLMs can better reconstruct original knowledge and entity attributes, thus posing significant privacy concerns.

Contribution

It provides empirical evidence that LLMs enhance reconstruction accuracy from embeddings, highlighting increased privacy risks compared to traditional models.

Findings

01

LLMs significantly improve reconstruction accuracy.

02

Enhanced risk of privacy breach with LLMs.

03

Potential mitigation strategies are discussed.

Abstract

Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation. However, such a solution risks compromising privacy, as recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models. The significant advantage of LLMs over traditional pre-trained models may exacerbate these concerns. To this end, we investigate the effectiveness of reconstructing original knowledge and predicting entity attributes from these embeddings when LLMs are employed. Empirical findings indicate that LLMs significantly improve the accuracy of two evaluated tasks over those from pre-trained models, regardless of whether the texts are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods