BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

Yunpeng Qing; Yixiao Chi; Shuo Chen; Shunyu Liu; Kexuan Zhou; Sixu Lin; Litao Liu; Changqing Zou

arXiv:2506.05762·cs.LG·May 15, 2026

BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

Yunpeng Qing, Yixiao Chi, Shuo Chen, Shunyu Liu, Kexuan Zhou, Sixu Lin, Litao Liu, Changqing Zou

PDF

TL;DR

BiTrajDiff introduces a bidirectional diffusion-based data augmentation method for offline RL, enhancing dataset diversity by modeling both future and past trajectories from any state.

Contribution

It proposes a novel bidirectional diffusion framework that models both forward and backward trajectories, improving data augmentation for offline RL.

Findings

01

BiTrajDiff outperforms existing data augmentation methods on D4RL benchmarks.

02

The bidirectional approach enhances exploration of underrepresented state regions.

03

Experiments show improved policy performance across various offline RL algorithms.

Abstract

Recent advances in offline Reinforcement Learning (RL) have proven that effective policy learning can benefit from imposing conservative constraints on pre-collected datasets. However, such static datasets often exhibit distribution bias, resulting in limited generalizability. To address this limitation, a straightforward solution is data augmentation (DA), which leverages generative models to enrich data distribution. Despite the promising results, current DA techniques focus solely on reconstructing future trajectories from given states, while ignoring the exploration of history transitions that reach them. This single-direction paradigm inevitably hinders the discovery of diverse behavior patterns, especially those leading to critical states that may have yielded high-reward outcomes. In this work, we introduce Bidirectional Trajectory Diffusion (BiTrajDiff), a novel DA framework for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.