Active exploration in parameterized reinforcement learning
Mehdi Khamassi, Costas Tzafestas

TL;DR
This paper introduces an active exploration algorithm for parameterized reinforcement learning in continuous action spaces, dynamically adjusting exploration parameters to improve performance in non-stationary environments.
Contribution
It proposes a novel meta-learning based method to automatically tune exploration parameters in structured continuous action spaces for RL.
Findings
Outperforms non-active exploration RL methods in a virtual human-robot interaction task.
Demonstrates the effectiveness of adaptive exploration in non-stationary environments.
Shows that meta-learning can effectively tune exploration parameters for better performance.
Abstract
Online model-free reinforcement learning (RL) methods with continuous actions are playing a prominent role when dealing with real-world applications such as Robotics. However, when confronted to non-stationary environments, these methods crucially rely on an exploration-exploitation trade-off which is rarely dynamically and automatically adjusted to changes in the environment. Here we propose an active exploration algorithm for RL in structured (parameterized) continuous action space. This framework deals with a set of discrete actions, each of which is parameterized with continuous variables. Discrete exploration is controlled through a Boltzmann softmax function with an inverse temperature parameter. In parallel, a Gaussian exploration is applied to the continuous action parameters. We apply a meta-learning algorithm based on the comparison between variations of short-term and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Advanced Bandit Algorithms Research
MethodsGaussian Process · Q-Learning · Softmax
