Boosting deep Reinforcement Learning using pretraining with Logical Options
Zihan Ye, Phil Chau, Raban Emunds, Jannis Bl\"uml, Cedric Derstroff, Quentin Delfosse, Oleg Arenz, Kristian Kersting

TL;DR
This paper introduces H^2RL, a hybrid hierarchical reinforcement learning method that pretrains agents with logical options to improve long-term decision-making, combining symbolic structure with deep neural policies.
Contribution
The paper presents a novel two-stage hybrid framework that integrates symbolic logical options into deep reinforcement learning to enhance long-horizon planning.
Findings
H^2RL outperforms baseline methods in long-horizon tasks.
Pretraining with logical options reduces short-term reward loops.
The approach improves goal-directed behavior in complex environments.
Abstract
Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. Recently, several symbolic approaches have addressed these challenges by encoding sparse objectives along with aligned plans. However, purely symbolic architectures are complex to scale and difficult to apply to continuous settings. Hence, we propose a hybrid approach, inspired by humans' ability to acquire new skills. We use a two-stage framework that injects symbolic structure into neural-based reinforcement learning agents without sacrificing the expressivity of deep policies. Our method, called Hybrid Hierarchical RL (H^2RL), introduces a logical option-based pretraining strategy to steer the learning policy away from short-term reward loops and toward goal-directed behavior while allowing the final policy to be refined via standard environment interaction. Empirically, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Domain Adaptation and Few-Shot Learning
