An Information-Theoretic Analysis of OOD Generalization in Meta-Reinforcement Learning
Xingtu Liu

TL;DR
This paper provides an information-theoretic framework for analyzing out-of-distribution generalization in meta-reinforcement learning, deriving bounds under various distribution shifts and examining a gradient-based algorithm.
Contribution
It introduces new generalization bounds for meta-supervised learning and meta-RL, considering different distribution shifts and MDP structures, advancing theoretical understanding.
Findings
Established OOD generalization bounds for meta-supervised learning.
Formalized the generalization problem in meta-RL with bounds exploiting MDP structure.
Analyzed the performance of a gradient-based meta-RL algorithm.
Abstract
In this work, we study out-of-distribution (OOD) generalization in meta-reinforcement learning from an information-theoretic perspective. We begin by establishing OOD generalization bounds for meta-supervised learning under two distinct distribution shift scenarios: standard distribution mismatch and a broad-to-narrow training setting. Building on this foundation, we formalize the generalization problem in meta-reinforcement learning and establish fine-grained generalization bounds that exploit the structure of Markov Decision Processes. Lastly, we analyze the generalization performance of a gradient-based meta-reinforcement learning algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
