Provable Domain Adaptation for Offline Reinforcement Learning with Limited Samples
Weiqin Chen, Xinjie Zhang, Sandipan Mishra, Santiago Paternain

TL;DR
This paper introduces a theoretical framework for offline reinforcement learning that optimally balances limited target data with auxiliary source data, providing performance guarantees and empirical validation on benchmark tasks.
Contribution
It presents the first theoretical analysis of dataset weighting in offline RL, establishing performance bounds and optimal weight computation methods.
Findings
Performance bounds depend on source data quality and target sample size.
Optimal dataset weights can be computed in closed form.
Empirical results validate theoretical guarantees on benchmarks.
Abstract
Offline reinforcement learning (RL) learns effective policies from a static target dataset. The performance of state-of-the-art offline RL algorithms notwithstanding, it relies on the size of the target dataset, and it degrades if limited samples in the target dataset are available, which is often the case in real-world applications. To address this issue, domain adaptation that leverages auxiliary samples from related source datasets (such as simulators) can be beneficial. However, establishing the optimal way to trade off the limited target dataset and the large-but-biased source dataset while ensuring provably theoretical guarantees remains an open challenge. To the best of our knowledge, this paper proposes the first framework that theoretically explores the impact of the weights assigned to each dataset on the performance of offline RL. In particular, we establish performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics
