Sample-efficient Iterative Lower Bound Optimization of Deep Reactive   Policies for Planning in Continuous MDPs

Siow Meng Low; Akshat Kumar; Scott Sanner

arXiv:2203.12679·cs.AI·March 25, 2022·1 cites

Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Siow Meng Low, Akshat Kumar, Scott Sanner

PDF

Open Access 1 Video

TL;DR

This paper introduces ILBO, a novel iterative lower bound optimization method for deep reactive policies in continuous MDPs, significantly improving sample efficiency and solution quality over existing approaches.

Contribution

The paper proposes a new ILBO framework that iteratively optimizes DRPs using lower bounds, reducing sample complexity and enhancing solution quality.

Findings

01

ILBO outperforms state-of-the-art DRP planners in sample efficiency.

02

ILBO produces solutions with lower variance and higher quality.

03

ILBO generalizes well to new problem instances without retraining.

Abstract

Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-to-end model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this work, we revisit the overall model-based DRP objective and instead take a minorization-maximization perspective to iteratively optimize the DRP w.r.t. a locally tight lower-bounded objective. This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs· underline

Taxonomy

TopicsMachine Learning and Algorithms · Oil and Gas Production Techniques