Dynamic Memory for Interpretable Sequential Optimisation
Srivas Chennu, Andrew Maher, Jamie Martin, Subash Prabanantham

TL;DR
This paper introduces an adaptive Bayesian learning agent with dynamic memory for non-stationary reinforcement learning tasks, emphasizing interpretability and robustness in real-world recommendation systems.
Contribution
It proposes a novel dynamic memory approach that enables interpretable, adaptive optimization in non-stationary environments, suitable for large-scale deployment.
Findings
The agent adapts correctly to real changes in rewards across multiple scenarios.
It balances interpretability and performance, prioritizing understanding over minimal regret.
Simulation results demonstrate robustness against various non-stationary conditions.
Abstract
Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these non-stationary cases, the agent must forget some historical knowledge, as it may no longer be relevant to minimise regret. We present a solution to handling non-stationarity that is suitable for deployment at scale, to provide business operators with automated adaptive optimisation. Our solution aims to provide interpretable learning that can be trusted by humans, whilst responding to non-stationarity to minimise regret. To this end, we develop an adaptive Bayesian learning agent that employs a novel form of dynamic memory. It enables interpretability through statistical hypothesis testing, by targeting a set point of statistical power when comparing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Data Classification
