TL;DR
This paper introduces DR.Q, a model-based representation method for continuous control that maximizes mutual information and uses prioritized replay, achieving superior performance over existing methods.
Contribution
It proposes DR.Q, a novel approach that explicitly maximizes mutual information in model-based representations for sample-efficient continuous control.
Findings
DR.Q matches or surpasses strong baselines on continuous control benchmarks.
DR.Q outperforms existing methods with a single hyperparameter setting.
Code for DR.Q is publicly available at the provided GitHub link.
Abstract
Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
