Reinforcement Learning with Algorithms from Probabilistic Structure Estimation
Jonathan P. Epperlein, Roman Overko, Sergiy Zhuk, Christopher King,, Djallel Bouneffouf, Andrew Cullen, Robert Shorten

TL;DR
This paper introduces a probabilistic structure estimation method for reinforcement learning that adaptively chooses between simple and complex algorithms based on environment impact, improving decision-making in uncertain settings.
Contribution
It proposes a likelihood-ratio test-based framework to automatically select the appropriate RL algorithm without prior environment assumptions.
Findings
The framework can effectively distinguish when myopic policies are optimal.
The proposed method provides a bound on regret in adaptive RL settings.
Simulations validate the approach in real-world scenarios.
Abstract
Reinforcement learning (RL) algorithms aim to learn optimal decisions in unknown environments through experience of taking actions and observing the rewards gained. In some cases, the environment is not influenced by the actions of the RL agent, in which case the problem can be modeled as a contextual multi-armed bandit and lightweight myopic algorithms can be employed. On the other hand, when the RL agent's actions affect the environment, the problem must be modeled as a Markov decision process and more complex RL algorithms are required which take the future effects of actions into account. Moreover, in practice, it is often unknown from the outset whether or not the agent's actions will impact the environment and it is therefore not possible to determine which RL algorithm is most fitting. In this work, we propose to avoid this difficult decision entirely and incorporate a choice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management
