BATS: Best Action Trajectory Stitching
Ian Char, Viraj Mehta, Adam Villaflor, John M. Dolan, Jeff Schneider

TL;DR
This paper introduces BATS, an offline reinforcement learning algorithm that constructs an augmented dataset with new trajectories using learned dynamics, enabling effective planning and policy improvement without constraints on logged data.
Contribution
BATS proposes a novel dataset augmentation method for offline RL by adding trajectories through learned models, facilitating planning and value estimation directly on the constructed MDP.
Findings
BATS can effectively plan trajectories using learned dynamics.
The method provides bounds on value functions based on dataset properties.
Behavior cloning the optimal policy of the constructed MDP avoids unwanted behaviors.
Abstract
The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions. Past efforts for developing algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data. In this work, we explore an alternative approach by planning on the fixed dataset directly. Specifically, we introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset. We do this by using learned dynamics models to plan short trajectories between states. Since exact value iteration can be performed on this constructed MDP, it becomes easy to identify which trajectories are advantageous to add to the MDP. Crucially, since most transitions in this MDP come from the logged…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Machine Learning and Algorithms
