Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy   RL

Soumya Rani Samineni

arXiv:2110.12239·cs.LG·October 26, 2021

Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy RL

Soumya Rani Samineni

PDF

Open Access

TL;DR

This paper introduces a hierarchical framework combining dynamic mirror descent MPC with off-policy RL to improve sample efficiency and convergence speed in model-free RL, demonstrated on MuJoCo and classical control tasks.

Contribution

The paper proposes a novel hierarchical framework integrating MPC-based trajectory optimization with off-policy RL, including two new algorithms DeMoRL and DeMo Layer, enhancing sample efficiency and online adaptation.

Findings

01

Faster convergence of DeMo RL compared to existing methods.

02

Achieved better or comparable performance on MuJoCo benchmarks.

03

DeMo Layer effectively trained simple policies on classical control tasks.

Abstract

Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two loops are proposed, where the Dynamic Mirror Descent based Model Predictive Control (DMD-MPC) is used as the inner loop to obtain an optimal sequence of actions. These actions are in turn used to significantly accelerate the outer loop Mf-RL. We show that our formulation is generic for a broad class of MPC based policies and objectives, and includes some of the well-known Mb-Mf approaches. Based on the framework we define two algorithms to increase sample efficiency of Off Policy RL and to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Control Systems Optimization