LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Hyesung Jeon; Hyeongju Ha; Jae-Joon Kim

arXiv:2602.01053·cs.LG·February 3, 2026

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Hyesung Jeon, Hyeongju Ha, Jae-Joon Kim

PDF

Open Access

TL;DR

LRAgent introduces a memory and computation-efficient KV cache sharing framework for multi-LoRA LLM agents by decomposing caches into shared backbone and adapter-specific components, enabling faster and more resource-efficient agent systems.

Contribution

The paper proposes LRAgent, a novel framework that decomposes and shares KV caches in multi-LoRA agents, reducing memory and compute overhead while maintaining accuracy.

Findings

01

Significant reduction in memory usage for KV caches.

02

Near-baseline accuracy in agentic question-answering tasks.

03

Improved throughput and latency in multi-LoRA agent systems.

Abstract

Role specialization in multi-LLM agent systems is often realized via multi-LoRA, where agents share a pretrained backbone and differ only through lightweight adapters. Despite sharing base model weights, each agent independently builds and stores its own KV cache for the same long, tool-augmented trajectories, incurring substantial memory and compute overhead. Existing KV cache sharing methods largely overlook this multi-LoRA setting. We observe that, across agents, cache differences are dominated by adapter outputs, while activations from the shared pretrained backbone remain highly similar. Based on this observation, we propose LRAgent, a KV cache sharing framework for multi-LoRA agents that decomposes the cache into a shared base component from the pretrained weights and an adapter-dependent component from LoRA weights. LRAgent reduces memory overhead by sharing the base component…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Mobile Crowdsensing and Crowdsourcing