PrivilegedDreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies
Morgan Byrd, Jackson Crandell, Mili Das, Jessica Inman, Robert Wright,, Sehoon Ha

TL;DR
PrivilegedDreamer is a model-based reinforcement learning framework that explicitly estimates hidden parameters in decision problems, enabling rapid adaptation and outperforming existing methods across diverse tasks.
Contribution
It introduces a novel dual recurrent architecture for explicit hidden parameter estimation and conditioning in model-based RL, improving adaptation in HIP-MDPs.
Findings
Outperforms state-of-the-art algorithms on five HIP-MDP tasks
Effective hidden parameter estimation from limited data
Ablation studies validate architecture components
Abstract
Numerous real-world control problems involve dynamics and objectives affected by unobservable hidden parameters, ranging from autonomous driving to robotic manipulation, which cause performance degradation during sim-to-real transfer. To represent these kinds of domains, we adopt hidden-parameter Markov decision processes (HIP-MDPs), which model sequential decision problems where hidden variables parameterize transition and reward functions. Existing approaches, such as domain randomization, domain adaptation, and meta-learning, simply treat the effect of hidden parameters as additional variance and often struggle to effectively handle HIP-MDP problems, especially when the rewards are parameterized by hidden variables. We introduce Privileged-Dreamer, a model-based reinforcement learning framework that extends the existing model-based approach by incorporating an explicit parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making
MethodsADaptive gradient method with the OPTimal convergence rate
