Environment Transformer and Policy Optimization for Model-Based Offline   Reinforcement Learning

Pengqin Wang; Meixin Zhu; Shaojie Shen

arXiv:2303.03811·cs.LG·October 17, 2023·1 cites

Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning

Pengqin Wang, Meixin Zhu, Shaojie Shen

PDF

Open Access

TL;DR

This paper introduces Environment Transformer, an uncertainty-aware sequence model for model-based offline reinforcement learning, improving simulation accuracy and efficiency, and enhancing policy learning in offline RL benchmarks.

Contribution

It proposes Environment Transformer, a novel uncertainty-aware sequence modeling architecture that captures environment dynamics and reward uncertainties, enabling more accurate and efficient offline RL training.

Findings

01

Achieves state-of-the-art performance on offline RL benchmarks.

02

Demonstrates superior simulation quality and long-term rollout capabilities.

03

Reduces training time and computational resources compared to probabilistic ensemble methods.

Abstract

Interacting with the actual environment to acquire data is often costly and time-consuming in robotic tasks. Model-based offline reinforcement learning (RL) provides a feasible solution. On the one hand, it eliminates the requirements of interaction with the actual environment. On the other hand, it learns the transition dynamics and reward function from the offline datasets and generates simulated rollouts to accelerate training. Previous model-based offline RL methods adopt probabilistic ensemble neural networks (NN) to model aleatoric uncertainty and epistemic uncertainty. However, this results in an exponential increase in training time and computing resource requirements. Furthermore, these methods are easily disturbed by the accumulative errors of the environment dynamics models when simulating long-term rollouts. To solve the above problems, we propose an uncertainty-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Robot Manipulation and Learning

MethodsAttention Is All You Need · Q-Learning · Linear Layer · Dropout · Layer Normalization · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Softmax