Transform Before You Query: A Privacy-Preserving Approach for Vector Retrieval with Embedding Space Alignment
Ruiqi He, Zekun Fei, Jiaqi Li, Xinyuan Zhu, Biao Yi, Siyi Lv, Weijie Liu, Zheli Liu

TL;DR
This paper presents STEER, a privacy-preserving vector retrieval framework that aligns semantic spaces to protect sensitive query data without sacrificing retrieval accuracy.
Contribution
STEER introduces a novel method for privacy-preserving vector retrieval using embedding space alignment, avoiding server modifications and resisting inversion attacks.
Findings
Maintains high retrieval accuracy with less than 5% decrease in Recall@100.
Prevents query text recovery from embeddings, ensuring privacy.
Achieves 20% higher Recall@20 compared to baselines on large datasets.
Abstract
Vector Database (VDB) can efficiently index and search high-dimensional vector embeddings from unstructured data, crucially enabling fast semantic similarity search essential for modern AI applications like generative AI and recommendation systems. Since current VDB service providers predominantly use proprietary black-box models, users are forced to expose raw query text to them via API in exchange for the vector retrieval services. Consequently, if query text involves confidential records from finance or healthcare domains, this mechanism inevitably leads to critical leakage of user's sensitive information. To address this issue, we introduce STEER (\textbf{S}ecure \textbf{T}ransformed \textbf{E}mbedding v\textbf{E}ctor\textbf{ R}etrieval), a private vector retrieval framework that leverages the alignment relationship between the semantic spaces of different embedding models to derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Machine Learning and Algorithms
