Optimistic Transfer under Task Shift via Bellman Alignment
Jinhang Chai, Enpei Zhang, Elynn Chen, Yujun Yan

TL;DR
This paper introduces Bellman alignment and a re-weighted targeting method for transfer reinforcement learning, enabling effective reuse of source task data despite transition mismatches, with theoretical regret bounds and empirical validation.
Contribution
It proposes Bellman alignment as a new abstraction for transfer in online RL and develops RWT, a correction operator that improves transfer across tasks with transition differences.
Findings
RWT reduces task mismatch to a one-step correction.
Regret bounds scale with task shift complexity, not the target MDP.
Empirical results show consistent improvements over baseline methods.
Abstract
We study online transfer reinforcement learning (RL) in episodic Markov decision processes, where experience from related source tasks is available during learning on a target task. A fundamental difficulty is that task similarity is typically defined in terms of rewards or transitions, whereas online RL algorithms operate on Bellman regression targets. As a result, naively reusing source Bellman updates introduces systematic bias and invalidates regret guarantees. We identify one-step Bellman alignment as the correct abstraction for transfer in online RL and propose re-weighted targeting (RWT), an operator-level correction that retargets continuation values and compensates for transition mismatch via a change of measure. RWT reduces task mismatch to a fixed one-step correction and enables statistically sound reuse of source data. This alignment yields a two-stage RWT -learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Domain Adaptation and Few-Shot Learning
