Trajectory-Level Data Augmentation for Offline Reinforcement Learning

Tobias Schm\"ahling; Matthias Burkhardt; Tobias Windisch

arXiv:2605.13401·cs.LG·May 14, 2026

Trajectory-Level Data Augmentation for Offline Reinforcement Learning

Tobias Schm\"ahling, Matthias Burkhardt, Tobias Windisch

PDF

TL;DR

This paper introduces a trajectory-based data augmentation method for offline reinforcement learning, enhancing model training from limited suboptimal data by leveraging task structure and geometric relationships.

Contribution

It presents a novel augmentation technique that exploits task structure and policy properties, improving offline RL performance with suboptimal trajectories.

Findings

01

Empirical validation across positioning tasks shows improved RL performance.

02

Theoretical justification supports the effectiveness of the augmentation strategy.

03

Supports training from suboptimal logging policies, increasing data utility.

Abstract

We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.