Auditing Prompt Caching in Language Model APIs
Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto

TL;DR
This paper investigates prompt caching in large language model APIs, revealing privacy risks from timing side-channel attacks and uncovering proprietary model architecture details through statistical audits.
Contribution
It introduces methods to detect prompt caching in real-world APIs and uncovers privacy and security implications of shared cache and timing variations.
Findings
Detected global cache sharing in seven API providers, including OpenAI.
Identified timing-based privacy leakage about user prompts.
Revealed that OpenAI's embedding model is a decoder-only Transformer.
Abstract
Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times to learn information about other users' prompts. Because prompt caching may cause privacy leakage, transparency around the caching policies of API providers is important. To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users' prompts. Timing variations due to prompt caching can also result in leakage of information about model architecture. Namely, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Natural Language Processing Techniques · Speech and dialogue systems
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding
