PrivilegedDreamer: Explicit Imagination of Privileged Information for   Rapid Adaptation of Learned Policies

Morgan Byrd; Jackson Crandell; Mili Das; Jessica Inman; Robert Wright,; Sehoon Ha

arXiv:2502.11377·cs.RO·February 19, 2025

PrivilegedDreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies

Morgan Byrd, Jackson Crandell, Mili Das, Jessica Inman, Robert Wright,, Sehoon Ha

PDF

Open Access

TL;DR

PrivilegedDreamer is a model-based reinforcement learning framework that explicitly estimates hidden parameters in decision problems, enabling rapid adaptation and outperforming existing methods across diverse tasks.

Contribution

It introduces a novel dual recurrent architecture for explicit hidden parameter estimation and conditioning in model-based RL, improving adaptation in HIP-MDPs.

Findings

01

Outperforms state-of-the-art algorithms on five HIP-MDP tasks

02

Effective hidden parameter estimation from limited data

03

Ablation studies validate architecture components

Abstract

Numerous real-world control problems involve dynamics and objectives affected by unobservable hidden parameters, ranging from autonomous driving to robotic manipulation, which cause performance degradation during sim-to-real transfer. To represent these kinds of domains, we adopt hidden-parameter Markov decision processes (HIP-MDPs), which model sequential decision problems where hidden variables parameterize transition and reward functions. Existing approaches, such as domain randomization, domain adaptation, and meta-learning, simply treat the effect of hidden parameters as additional variance and often struggle to effectively handle HIP-MDP problems, especially when the rewards are parameterized by hidden variables. We introduce Privileged-Dreamer, a model-based reinforcement learning framework that extends the existing model-based approach by incorporating an explicit parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making

MethodsADaptive gradient method with the OPTimal convergence rate