Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning
Wei Liu, Ting Long

TL;DR
This paper introduces TABB, a novel method for cross-domain offline reinforcement learning that assesses source data transferability based on Bellman target alignment, improving policy learning accuracy.
Contribution
The paper proposes a new transferability measure for source data in CDRL based on Bellman target alignment, addressing limitations of transition similarity methods.
Findings
TABB outperforms existing methods in various cross-domain offline RL tasks.
TABB achieves consistent strong performance with limited target data.
Bellman target-based transferability improves policy learning accuracy.
Abstract
Cross-domain offline reinforcement learning (CDRL) aims to improve policy learning in a target domain by leveraging data collected from a source domain. Existing works typically assess the transferability of source-domain data by measuring its similarity to target-domain transitions, and implicitly perform transition-level selection. Transitions that are considered similar are assigned higher weights or rewards, while dissimilar ones are down-weighted. However, transition-level similarity does not necessarily imply consistency in long-term returns. Even visually or dynamically similar transitions may lead to significantly different outcomes in the target domain, which can mislead policy learning and degrade performance. To address this issue, we revisit the fundamental objective of policy learning. Since policy optimization ultimately relies on Bellman targets to evaluate the quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
