Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors
Emma Cramer, Bernd Frauenknecht, Ramil Sabirov, Sebastian Trimpe

TL;DR
This paper introduces CHEQ, an adaptive hybrid RL algorithm that dynamically combines a prior controller with RL based on uncertainty, leading to faster, safer, and more transferable learning in complex tasks.
Contribution
The paper proposes CHEQ, a novel adaptive hybrid RL method that adjusts the influence of control priors during training using ensemble uncertainty, improving efficiency and safety.
Findings
CHEQ outperforms state-of-the-art methods in data efficiency.
CHEQ enhances exploration safety during training.
CHEQ demonstrates better transferability to new scenarios.
Abstract
Combining Reinforcement Learning (RL) with a prior controller can yield the best out of two worlds: RL can solve complex nonlinear problems, while the control prior ensures safer exploration and speeds up training. Prior work largely blends both components with a fixed weight, neglecting that the RL agent's performance varies with the training progress and across regions in the state space. Therefore, we advocate for an adaptive strategy that dynamically adjusts the weighting based on the RL agent's current capabilities. We propose a new adaptive hybrid RL algorithm, Contextualized Hybrid Ensemble Q-learning (CHEQ). CHEQ combines three key ingredients: (i) a time-invariant formulation of the adaptive hybrid RL problem treating the adaptive weight as a context variable, (ii) a weight adaption mechanism based on the parametric uncertainty of a critic ensemble, and (iii) ensemble-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Mapping
MethodsQ-Learning
