POLAR: Online Learning for LoRA Adapter Caching and Routing in Edge LLM Serving

Shaoang Li; Jian Li

arXiv:2604.16583·cs.LG·April 21, 2026

POLAR: Online Learning for LoRA Adapter Caching and Routing in Edge LLM Serving

Shaoang Li, Jian Li

PDF

TL;DR

POLAR introduces an online learning approach for efficient caching and routing of LoRA adapters in edge LLM deployment, reducing latency and optimizing resource use.

Contribution

The paper formulates the cache routing problem as a two-timescale contextual bandit and proposes POLAR, a novel algorithm with theoretical guarantees and practical effectiveness.

Findings

01

POLAR outperforms non-adaptive baselines in experiments.

02

Theoretical regret bounds are established for the proposed algorithms.

03

Adaptive cache control significantly reduces latency in edge LLM serving.

Abstract

Edge deployment of large language models (LLMs) increasingly relies on libraries of lightweight LoRA adapters, yet GPU/DRAM can keep only a small resident subset at a time. Serving a request through a non-resident adapter requires paging its weights from storage, incurring measurable latency. This creates a two-timescale online control problem: on a slow timescale, the system selects which adapters remain resident in fast memory, while on a fast timescale it routes each request to an adapter whose context-dependent utility is unknown a priori. The two decisions are tightly coupled: the cache determines the cost of exploration, and the router determines which adapters receive informative feedback. We formulate this joint caching-and-routing problem as a two-timescale contextual bandit and propose POLAR (Paging and Online Learning for Adapter Routing). POLAR pairs a cache-aware LinUCB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.