From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching

Zhixiang Zhang; Zesen Liu; Yuchong Xie; Quanfeng Huang; Dongdong She

arXiv:2601.23088·cs.CR·February 2, 2026

From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching

Zhixiang Zhang, Zesen Liu, Yuchong Xie, Quanfeng Huang, Dongdong She

PDF

Open Access

TL;DR

This paper reveals that semantic caching in large language models is vulnerable to key collision attacks due to an inherent trade-off between cache efficiency and security, demonstrating practical attack methods and potential mitigation strategies.

Contribution

It introduces CacheAttack, the first systematic framework for black-box collision attacks on semantic cache keys in LLMs, highlighting security vulnerabilities and proposing defenses.

Findings

01

CacheAttack achieves 86% hit rate in hijacking LLM responses.

02

Collision attacks can induce malicious behaviors in LLM agents.

03

Vulnerabilities transfer across different embedding models.

Abstract

Semantic caching has emerged as a pivotal technique for scaling LLM applications, widely adopted by major providers including AWS and Microsoft. By utilizing semantic embedding vectors as cache keys, this mechanism effectively minimizes latency and redundant computation for semantically similar queries. In this work, we conceptualize semantic cache keys as a form of fuzzy hashes. We demonstrate that the locality required to maximize cache hit rates fundamentally conflicts with the cryptographic avalanche effect necessary for collision resistance. Our conceptual analysis formalizes this inherent trade-off between performance (locality) and security (collision resilience), revealing that semantic caching is naturally vulnerable to key collision attacks. While prior research has focused on side-channel and privacy risks, we present the first systematic study of integrity risks arising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Distributed systems and fault tolerance · Caching and Content Delivery