When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges

Sichu Liang; Zhenglin Wang; Jiajia Chu; Pengfei Xia; Hui Zang; Deyu Zhou

arXiv:2601.08343·cs.MA·January 14, 2026

When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges

Sichu Liang, Zhenglin Wang, Jiajia Chu, Pengfei Xia, Hui Zang, Deyu Zhou

PDF

Open Access

TL;DR

This paper reveals that KV cache reuse in multi-agent LLM systems can disrupt judge consistency, especially in later candidate blocks, necessitating dedicated design for judge-centric inference.

Contribution

It uncovers a failure mode of KV cache reuse affecting judge behavior and emphasizes the importance of explicit cross-candidate interaction for reliable judge inference.

Findings

01

KV cache reuse can weaken cross-candidate attention in judges

02

Judge consistency is compromised by reuse strategies, especially in later candidates

03

Explicit cross-candidate interaction is essential for preserving decision accuracy

Abstract

Multi-agent LLM systems routinely generate multiple candidate responses that are aggregated by an LLM judge. To reduce the dominant prefill cost in such pipelines, recent work advocates KV cache reuse across partially shared contexts and reports substantial speedups for generation agents. In this work, we show that these efficiency gains do not transfer uniformly to judge-centric inference. Across GSM8K, MMLU, and HumanEval, we find that reuse strategies that are effective for execution agents can severely perturb judge behavior: end-task accuracy may appear stable, yet the judge's selection becomes highly inconsistent with dense prefill. We quantify this risk using Judge Consistency Rate (JCR) and provide diagnostics showing that reuse systematically weakens cross-candidate attention, especially for later candidate blocks. Our ablation further demonstrates that explicit cross-candidate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Distributed systems and fault tolerance · Cloud Computing and Resource Management