Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks

Yu-Che Tsai; Hsiang Hsiao; Kuan-Yu Chen; Shou-De Lin

arXiv:2602.07090·cs.CR·February 10, 2026

Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks

Yu-Che Tsai, Hsiang Hsiao, Kuan-Yu Chen, Shou-De Lin

PDF

Open Access 3 Reviews

TL;DR

This paper introduces SPARSE, a concept-aware privacy framework for text embeddings that selectively protects sensitive information by identifying privacy-sensitive dimensions and applying calibrated elliptical noise, improving privacy and utility.

Contribution

SPARSE is a novel user-centric framework that combines differentiable mask learning and the Mahalanobis mechanism for concept-specific privacy protection in text embeddings.

Findings

01

SPARSE reduces privacy leakage across multiple datasets and models.

02

SPARSE outperforms existing differential privacy methods in utility.

03

SPARSE effectively balances privacy and utility in embedding inversion attacks.

Abstract

Text embeddings enable numerous NLP applications but face severe privacy risks from embedding inversion attacks, which can expose sensitive attributes or reconstruct raw text. Existing differential privacy defenses assume uniform sensitivity across embedding dimensions, leading to excessive noise and degraded utility. We propose SPARSE, a user-centric framework for concept-specific privacy protection in text embeddings. SPARSE combines (1) differentiable mask learning to identify privacy-sensitive dimensions for user-defined concepts, and (2) the Mahalanobis mechanism that applies elliptical noise calibrated by dimension sensitivity. Unlike traditional spherical noise injection, SPARSE selectively perturbs privacy-sensitive dimensions while preserving non-sensitive semantics. Evaluated across six datasets with three embedding models and attack scenarios, SPARSE consistently reduces…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper is well-written and well-structured. 2. The problem and methods are well-defined, especially dataset construction and learning objectives in mask learning. 3. The results look promising. SPARSE substantially reduces leakage at the same privacy budget while preserving/improving downstream accuracy compared to spherical-noise baselines across semantic similarity and retrieval tasks (e.g., STS12, FIQA) and against multiple inversion attacks (Vec2Text, GEIA, MLC. On clinical text (MIMIC

Weaknesses

1. The training cost should be clarified/ compared to the baseline since the proposed method involves the additional mask learning to identify privacy sensitive dimensions. 2. The code was not released.

Reviewer 02Rating 6Confidence 4

Strengths

- Obfuscating sensitive concept in embeddings is a non-trivial problem, and this paper innovatively applies dimension masking and Mahalanobis mechanism to address this challenge. - The LDP of Mahalanobis Norm can be connected with Generalized Laplace Mechanism. - The authors conducted comprehensive experiments with promising results.

Weaknesses

- The frameworks assumes that sensitive concept are correlated with the embedding dimension, while this might not be the case. The related dimension for each concept could change depending on the context. - SPARSE relies on a pre-defined concept vocabulary and their corresponding masks. There could be emerging new concept in real-world, making it computation intensive to retrain the model. - In experiment, the authors use NER to extract sensitive information, which is limited. More complex priva

Reviewer 03Rating 2Confidence 5

Strengths

- Embedding inversion is relevant to deployed retrieval/RAG systems; aligning protection to user-specified concepts reflects realistic privacy needs beyond coarse PII assumptions. - The combination of concept-conditioned sparse dimension selection and anisotropic perturbation is a step beyond spherical Laplace noise. - SPARSE shows consistently lower leakage at comparable or better downstream metrics relative to baselines.

Weaknesses

- The pipeline “user-defined $C$ → NER to extract tokens” inherits false positives/negatives and domain coverage limitations. The paper instantiates $C$ mostly with NER/PII tokens and acknowledges extensibility but does not quantify failure modes or robustness to imperfect concept detection - Negative samples are built by removing tokens in $C$. This can alter syntax and semantics beyond the concept, potentially making the discrimination task easier in ways not strictly tied to $C$. The classifi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Privacy-Preserving Technologies in Data · Topic Modeling