Model-based Trajectory Stitching for Improved Offline Reinforcement   Learning

Charles A. Hepburn; Giovanni Montana

arXiv:2211.11603·cs.LG·November 22, 2022

Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

Charles A. Hepburn, Giovanni Montana

PDF

Open Access

TL;DR

This paper introduces Trajectory Stitching, a model-based data augmentation method that enhances offline RL by connecting disconnected states with synthetic actions, leading to improved policy performance.

Contribution

The paper presents a novel trajectory stitching technique that probabilistically joins historical trajectories with synthetic actions to improve offline reinforcement learning.

Findings

01

Trajectory Stitching improves policy quality over original data.

02

Combining TS with behavioral cloning enhances offline RL results.

03

The method facilitates better online RL initialization.

Abstract

In many real-world applications, collecting large and high-quality datasets may be too costly or impractical. Offline reinforcement learning (RL) aims to infer an optimal decision-making policy from a fixed set of data. Getting the most information from historical data is then vital for good performance once the policy is deployed. We propose a model-based data augmentation strategy, Trajectory Stitching (TS), to improve the quality of sub-optimal historical trajectories. TS introduces unseen actions joining previously disconnected states: using a probabilistic notion of state reachability, it effectively `stitches' together parts of the historical demonstrations to generate new, higher quality ones. A stitching event consists of a transition between a pair of observed states through a synthetic and highly probable action. New actions are introduced only when they are expected to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Behavioral and Psychological Studies

MethodsSpatio-temporal stability analysis