Modeling and Optimizing Latency for Delayed Hit Caching with Stochastic Miss Latency
Bowen Jiang, Chaofan Ma

TL;DR
This paper analyzes the impact of stochastic fetch latencies on delayed hit caching and introduces a variance-aware caching strategy that improves latency performance.
Contribution
It provides the first theoretical analysis of delayed hits with stochastic latencies and proposes a novel cache eviction method based on variance considerations.
Findings
Significant latency reduction on synthetic datasets (3%-30%)
Moderate latency reduction on real-world traces (1%-7%)
First analytical expressions for mean and variance of delays with stochastic fetches
Abstract
Caching is crucial for system performance, but the delayed hit phenomenon, where requests queue during lengthy fetches after a cache miss, significantly degrades user-perceived latency in modern high-throughput systems. While prior works address delayed hits by estimating aggregate delay, they universally assume deterministic fetch latencies. This paper tackles the more realistic, yet unexplored, scenario where fetch latencies are stochastic. We present, to our knowledge, the first theoretical analysis of delayed hits under this condition, deriving analytical expressions for both the mean and variance of the aggregate delay assuming exponentially distributed fetch latency. Leveraging these insights, we develop a novel variance-aware ranking function tailored for this stochastic setting to guide cache eviction decisions more effectively. The simulations on synthetic and real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Advanced Data Storage Technologies
