Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning
Dongsu Lee, Minhae Kwon

TL;DR
This paper introduces TempDATA, a novel offline model-based reinforcement learning framework that generates temporally structured transition augmentations in latent space, improving performance in long-horizon, sparse-reward tasks.
Contribution
TempDATA is the first to model temporal distance in latent space for transition augmentation, enhancing offline MBRL in complex long-horizon environments.
Findings
TempDATA outperforms previous offline MBRL methods.
Achieves comparable or better results than diffusion-based augmentation.
Effective in long-horizon, sparse-reward tasks.
Abstract
The goal of offline reinforcement learning (RL) is to extract a high-performance policy from the fixed datasets, minimizing performance degradation due to out-of-distribution (OOD) samples. Offline model-based RL (MBRL) is a promising approach that ameliorates OOD issues by enriching state-action transitions with augmentations synthesized via a learned dynamics model. Unfortunately, seminal offline MBRL methods often struggle in sparse-reward, long-horizon tasks. In this work, we introduce a novel MBRL framework, dubbed Temporal Distance-Aware Transition Augmentation (TempDATA), that generates augmented transitions in a temporally structured latent space rather than in raw state space. To model long-horizon behavior, TempDATA learns a latent abstraction that captures a temporal distance from both trajectory and transition levels of state space. Our experiments confirm that TempDATA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
