Optimistic Transfer under Task Shift via Bellman Alignment

Jinhang Chai; Enpei Zhang; Elynn Chen; Yujun Yan

arXiv:2601.21924·cs.LG·January 30, 2026

Optimistic Transfer under Task Shift via Bellman Alignment

Jinhang Chai, Enpei Zhang, Elynn Chen, Yujun Yan

PDF

Open Access

TL;DR

This paper introduces Bellman alignment and a re-weighted targeting method for transfer reinforcement learning, enabling effective reuse of source task data despite transition mismatches, with theoretical regret bounds and empirical validation.

Contribution

It proposes Bellman alignment as a new abstraction for transfer in online RL and develops RWT, a correction operator that improves transfer across tasks with transition differences.

Findings

01

RWT reduces task mismatch to a one-step correction.

02

Regret bounds scale with task shift complexity, not the target MDP.

03

Empirical results show consistent improvements over baseline methods.

Abstract

We study online transfer reinforcement learning (RL) in episodic Markov decision processes, where experience from related source tasks is available during learning on a target task. A fundamental difficulty is that task similarity is typically defined in terms of rewards or transitions, whereas online RL algorithms operate on Bellman regression targets. As a result, naively reusing source Bellman updates introduces systematic bias and invalidates regret guarantees. We identify one-step Bellman alignment as the correct abstraction for transfer in online RL and propose re-weighted targeting (RWT), an operator-level correction that retargets continuation values and compensates for transition mismatch via a change of measure. RWT reduces task mismatch to a fixed one-step correction and enables statistically sound reuse of source data. This alignment yields a two-stage RWT $Q$ -learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Domain Adaptation and Few-Shot Learning