Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains

Ibne Farabi Shihab; Sanjeda Akter; Anuj Sharma

arXiv:2505.07274·cs.LG·September 30, 2025

Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

PDF

Open Access 1 Video

TL;DR

This paper introduces a cache-efficient posterior sampling framework for reinforcement learning with LLM-derived priors, significantly reducing computational costs while maintaining high performance across various domains.

Contribution

It proposes an adaptive caching mechanism with meta-optimized parameters, enabling efficient inference and extending to offline RL with notable performance improvements.

Findings

01

Achieves 3.8--4.7× reduction in LLM queries

02

Reduces median latency by 4.0--12.0× on consumer GPU

03

Maintains 96--98% of uncached performance

Abstract

Integrating large language models (LLMs) as priors in reinforcement learning (RL) offers significant advantages but comes with substantial computational costs. We present a principled cache-efficient framework for posterior sampling with LLM-derived priors that dramatically reduces these costs while maintaining high performance. At the core of our approach is an adaptive caching mechanism, where cache parameters are meta-optimized using surrogate gradients derived from policy performance. This design enables efficient inference across both discrete text environments (e.g., TextWorld, ALFWorld) and continuous control domains (e.g., MuJoCo), achieving a 3.8--4.7 $\times$ reduction in LLM queries and 4.0--12.0 $\times$ lower median latencies (85--93\,ms on a consumer GPU) while retaining 96--98\% of uncached performance. Our theoretical analysis provides KL divergence bounds on approximation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications