Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

TL;DR
This paper introduces a cache-efficient posterior sampling framework for reinforcement learning with LLM-derived priors, significantly reducing computational costs while maintaining high performance across various domains.
Contribution
It proposes an adaptive caching mechanism with meta-optimized parameters, enabling efficient inference and extending to offline RL with notable performance improvements.
Findings
Achieves 3.8--4.7× reduction in LLM queries
Reduces median latency by 4.0--12.0× on consumer GPU
Maintains 96--98% of uncached performance
Abstract
Integrating large language models (LLMs) as priors in reinforcement learning (RL) offers significant advantages but comes with substantial computational costs. We present a principled cache-efficient framework for posterior sampling with LLM-derived priors that dramatically reduces these costs while maintaining high performance. At the core of our approach is an adaptive caching mechanism, where cache parameters are meta-optimized using surrogate gradients derived from policy performance. This design enables efficient inference across both discrete text environments (e.g., TextWorld, ALFWorld) and continuous control domains (e.g., MuJoCo), achieving a 3.8--4.7 reduction in LLM queries and 4.0--12.0 lower median latencies (85--93\,ms on a consumer GPU) while retaining 96--98\% of uncached performance. Our theoretical analysis provides KL divergence bounds on approximation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications
