Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects
Hanxi Li, Jianan Zhou, Jiale Lao, Yibo Wang, Zhengmao Ye, Yang Cao, Junfen Wang, and Mingjie Tang

TL;DR
This paper introduces the Black-Hole Attack, a poisoning method exploiting high-dimensional embedding space properties to compromise vector database retrievals, revealing vulnerabilities and evaluating defenses.
Contribution
It uncovers a new security vulnerability in vector databases caused by centrality-driven hubness and evaluates existing mitigation strategies.
Findings
Malicious vectors appear in up to 99.85% of top-10 results.
Existing hubness mitigation methods either reduce accuracy or offer limited protection.
High-dimensional embedding space properties enable the Black-Hole Attack.
Abstract
Vector databases serve as the retrieval backbone of modern AI applications, yet their security remains largely unexplored. We propose the Black-Hole Attack, a poisoning attack that injects a small number of malicious vectors near the geometric center of the stored vectors. These injected vectors attract queries like a black hole and frequently appear in the top-k retrieval results for most queries. This attack is enabled by a phenomenon we term centrality-driven hubness: in high-dimensional embedding spaces, vectors near the centroid become nearest neighbors of a disproportionately large number of other vectors, while this centroid region is nearly empty in practice. The attack shows that vectors in a vector database cannot be blindly trusted: geometric defects in high-dimensional embeddings make retrieval inherently vulnerable. Our experiments show that malicious vectors appear in up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
