Model-Based Offline Planning with Trajectory Pruning

Xianyuan Zhan; Xiangyu Zhu; Haoran Xu

arXiv:2105.07351·cs.AI·April 22, 2022·5 cites

Model-Based Offline Planning with Trajectory Pruning

Xianyuan Zhan, Xiangyu Zhu, Haoran Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MOPP, a lightweight model-based offline planning framework that improves performance by aggressive trajectory rollout and pruning, addressing practical challenges in offline RL for real-world systems.

Contribution

MOPP is a novel offline planning framework that balances trajectory exploration and pruning, enhancing offline RL performance in real-world control tasks.

Findings

01

MOPP achieves competitive results with existing offline RL methods.

02

Trajectory pruning improves planning robustness.

03

Aggressive rollout guided by learned behavior policy enhances performance.

Abstract

The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction. Unfortunately, existing offline RL methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. The model-based planning framework provides an attractive alternative. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhanzxy5/MOPP
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Smart Grid Security and Resilience