LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents
Hyesung Jeon, Hyeongju Ha, Jae-Joon Kim

TL;DR
LRAgent introduces a memory and computation-efficient KV cache sharing framework for multi-LoRA LLM agents by decomposing caches into shared backbone and adapter-specific components, enabling faster and more resource-efficient agent systems.
Contribution
The paper proposes LRAgent, a novel framework that decomposes and shares KV caches in multi-LoRA agents, reducing memory and compute overhead while maintaining accuracy.
Findings
Significant reduction in memory usage for KV caches.
Near-baseline accuracy in agentic question-answering tasks.
Improved throughput and latency in multi-LoRA agent systems.
Abstract
Role specialization in multi-LLM agent systems is often realized via multi-LoRA, where agents share a pretrained backbone and differ only through lightweight adapters. Despite sharing base model weights, each agent independently builds and stores its own KV cache for the same long, tool-augmented trajectories, incurring substantial memory and compute overhead. Existing KV cache sharing methods largely overlook this multi-LoRA setting. We observe that, across agents, cache differences are dominated by adapter outputs, while activations from the shared pretrained backbone remain highly similar. Based on this observation, we propose LRAgent, a KV cache sharing framework for multi-LoRA agents that decomposes the cache into a shared base component from the pretrained weights and an adapter-dependent component from LoRA weights. LRAgent reduces memory overhead by sharing the base component…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Mobile Crowdsensing and Crowdsourcing
