Auditing Prompt Caching in Language Model APIs

Chenchen Gu; Xiang Lisa Li; Rohith Kuditipudi; Percy Liang; Tatsunori Hashimoto

arXiv:2502.07776·cs.CL·July 15, 2025

Auditing Prompt Caching in Language Model APIs

Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto

PDF

Open Access 1 Repo

TL;DR

This paper investigates prompt caching in large language model APIs, revealing privacy risks from timing side-channel attacks and uncovering proprietary model architecture details through statistical audits.

Contribution

It introduces methods to detect prompt caching in real-world APIs and uncovers privacy and security implications of shared cache and timing variations.

Findings

01

Detected global cache sharing in seven API providers, including OpenAI.

02

Identified timing-based privacy leakage about user prompts.

03

Revealed that OpenAI's embedding model is a decoder-only Transformer.

Abstract

Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times to learn information about other users' prompts. Because prompt caching may cause privacy leakage, transparency around the caching policies of API providers is important. To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users' prompts. Timing variations due to prompt caching can also result in leakage of information about model architecture. Namely, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenchenygu/auditing-prompt-caching
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsService-Oriented Architecture and Web Services · Natural Language Processing Techniques · Speech and dialogue systems

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding