Statistical Independence Aware Caching for LLM Workflows

Yihan Dai; Dimitrios Stamatios Bouras; Haoxiang Jia; Sergey Mechtaev

arXiv:2511.22118·cs.SE·December 1, 2025

Statistical Independence Aware Caching for LLM Workflows

Yihan Dai, Dimitrios Stamatios Bouras, Haoxiang Jia, Sergey Mechtaev

PDF

Open Access

TL;DR

This paper introduces Mnimi, a cache design pattern that ensures statistical independence in LLM workflows, improving reproducibility and efficiency without compromising probabilistic properties.

Contribution

The paper presents Mnimi, a novel cache design pattern that enforces statistical constraints in LLM workflows through type encapsulation, addressing limitations of existing caching systems.

Findings

01

Mnimi improves reproducibility and debugging in LLM workflows.

02

It reduces time and cost in LLM-based systems.

03

Mnimi maintains statistical correctness in caching.

Abstract

Large language models (LLMs) inference is both expensive and slow. Local caching of responses offers a practical solution to reduce the cost and latency of LLM queries. In research contexts, caching also enhances reproducibility and provides flexibility for experimentation. However, naive reuse of cached responses compromises statistical independence, a critical property for probabilistic workflows. In applications of LLM for code, it underpins performance metrics such as Pass@k and uncertainty estimation, as well as algorithms like program repair loops and retries. Existing LLM caching systems lack ways to enforce statistical independence constraints. To address this, we introduce Mnimi, a cache design pattern that supports modular LLM workflows while ensuring statistical integrity at the component level. Its core innovation lies in encapsulating statistical constraints within the type…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Parallel Computing and Optimization Techniques · Machine Learning and Algorithms