Q-KVComm: Efficient Multi-Agent Communication Via Adaptive KV Cache Compression

Boris Kriuk; Logic Ng

arXiv:2512.17914·cs.CL·December 23, 2025

Q-KVComm: Efficient Multi-Agent Communication Via Adaptive KV Cache Compression

Boris Kriuk, Logic Ng

PDF

Open Access

TL;DR

Q-KVComm introduces an adaptive, compressed KV cache protocol for multi-agent LLM systems, significantly reducing bandwidth while preserving semantic integrity across diverse tasks and models.

Contribution

It presents a novel adaptive quantization and hybrid extraction method for direct KV cache transmission, enabling efficient multi-agent communication.

Findings

01

Achieves 5-6x compression ratios with high semantic fidelity.

02

Maintains coherence scores above 0.77 across datasets.

03

Works effectively across models from 1.1B to 1.5B parameters.

Abstract

Multi-agent Large Language Model (LLM) systems face a critical bottleneck: redundant transmission of contextual information between agents consumes excessive bandwidth and computational resources. Traditional approaches discard internal semantic representations and transmit raw text, forcing receiving agents to recompute similar representations from scratch. We introduce Q-KVComm, a new protocol that enables direct transmission of compressed key-value (KV) cache representations between LLM agents. Q-KVComm combines three key innovations: (1) adaptive layer-wise quantization that allocates variable bit-widths based on sensitivity profiling, (2) hybrid information extraction that preserves critical facts across content domains, and (3) heterogeneous model calibration establishing cross-architecture communication. Extensive experiments across three diverse question-answering datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications