Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning
Pengqin Wang, Meixin Zhu, Shaojie Shen

TL;DR
This paper introduces Environment Transformer, an uncertainty-aware sequence model for model-based offline reinforcement learning, improving simulation accuracy and efficiency, and enhancing policy learning in offline RL benchmarks.
Contribution
It proposes Environment Transformer, a novel uncertainty-aware sequence modeling architecture that captures environment dynamics and reward uncertainties, enabling more accurate and efficient offline RL training.
Findings
Achieves state-of-the-art performance on offline RL benchmarks.
Demonstrates superior simulation quality and long-term rollout capabilities.
Reduces training time and computational resources compared to probabilistic ensemble methods.
Abstract
Interacting with the actual environment to acquire data is often costly and time-consuming in robotic tasks. Model-based offline reinforcement learning (RL) provides a feasible solution. On the one hand, it eliminates the requirements of interaction with the actual environment. On the other hand, it learns the transition dynamics and reward function from the offline datasets and generates simulated rollouts to accelerate training. Previous model-based offline RL methods adopt probabilistic ensemble neural networks (NN) to model aleatoric uncertainty and epistemic uncertainty. However, this results in an exponential increase in training time and computing resource requirements. Furthermore, these methods are easily disturbed by the accumulative errors of the environment dynamics models when simulating long-term rollouts. To solve the above problems, we propose an uncertainty-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Robot Manipulation and Learning
MethodsAttention Is All You Need · Q-Learning · Linear Layer · Dropout · Layer Normalization · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Softmax
