Robust Transfer Learning with Side Information
Akram S. Awad, Shihab Ahmed, Yue Wang, George K. Atia

TL;DR
This paper introduces a transfer learning framework for robust Markov Decision Processes that leverages side information to create tighter uncertainty sets, leading to improved policy robustness and sample efficiency in environment shifts.
Contribution
The paper proposes a novel transfer approach using estimate-centered uncertainty sets with side information, enhancing robustness and efficiency over existing methods.
Findings
Outperforms state-of-the-art baselines in OpenAI Gym environments.
Provides finite-sample guarantees and error bounds for the learned policies.
Reduces sub-optimality gap under low-dimensional transition model assumptions.
Abstract
Robust Markov Decision Processes (MDPs) address environmental shift through distributionally robust optimization (DRO) by finding an optimal worst-case policy within an uncertainty set of transition kernels. However, standard DRO approaches require enlarging the uncertainty set under large shifts, which leads to overly conservative and pessimistic policies. In this paper, we propose a framework for transfer under environment shift that derives a robust target-domain policy via estimate-centered uncertainty sets, constructed through constrained estimation that integrates limited target samples with side information about the source-target dynamics. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. The side information includes bounds on feature moments, distributional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research
