Modular Deep Reinforcement Learning with Temporal Logic Specifications
Lim Zun Yuan, Mohammadhosein Hasanbeig, Alessandro Abate, Daniel, Kroening

TL;DR
This paper introduces a modular deep reinforcement learning framework that leverages temporal logic specifications represented by finite-state machines to guide policy learning in sparse reward environments, demonstrated on a Mars rover task.
Contribution
It presents a novel integration of temporal logic with a modular DDPG architecture for continuous control in sparse reward settings.
Findings
Successful policy synthesis in Mars rover simulation
High success rate of the learned control policies
Effective use of temporal structure to guide reinforcement learning
Abstract
We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temporal structure acts as a guide for the RL agent within the product, where a modular Deep Deterministic Policy Gradient (DDPG) architecture is proposed to generate a low-level control policy. We evaluate our framework in a Mars rover experiment and we present the success rate of the synthesised policy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Advanced Software Engineering Methodologies
