Trajectory-Level Data Augmentation for Offline Reinforcement Learning
Tobias Schm\"ahling, Matthias Burkhardt, Tobias Windisch

TL;DR
This paper introduces a trajectory-based data augmentation method for offline reinforcement learning, enhancing model training from limited suboptimal data by leveraging task structure and geometric relationships.
Contribution
It presents a novel augmentation technique that exploits task structure and policy properties, improving offline RL performance with suboptimal trajectories.
Findings
Empirical validation across positioning tasks shows improved RL performance.
Theoretical justification supports the effectiveness of the augmentation strategy.
Supports training from suboptimal logging policies, increasing data utility.
Abstract
We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
