Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning

Wei Liu; Ting Long

arXiv:2605.22376·cs.LG·May 22, 2026

Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning

Wei Liu, Ting Long

PDF

TL;DR

This paper introduces TABB, a novel method for cross-domain offline reinforcement learning that assesses source data transferability based on Bellman target alignment, improving policy learning accuracy.

Contribution

The paper proposes a new transferability measure for source data in CDRL based on Bellman target alignment, addressing limitations of transition similarity methods.

Findings

01

TABB outperforms existing methods in various cross-domain offline RL tasks.

02

TABB achieves consistent strong performance with limited target data.

03

Bellman target-based transferability improves policy learning accuracy.

Abstract

Cross-domain offline reinforcement learning (CDRL) aims to improve policy learning in a target domain by leveraging data collected from a source domain. Existing works typically assess the transferability of source-domain data by measuring its similarity to target-domain transitions, and implicitly perform transition-level selection. Transitions that are considered similar are assigned higher weights or rewards, while dissimilar ones are down-weighted. However, transition-level similarity does not necessarily imply consistency in long-term returns. Even visually or dynamically similar transitions may lead to significantly different outcomes in the target domain, which can mislead policy learning and degrade performance. To address this issue, we revisit the fundamental objective of policy learning. Since policy optimization ultimately relies on Bellman targets to evaluate the quality of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.