Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning
Menglong Zhang, Fuyuan Qian

TL;DR
This paper introduces SimBelief, a meta-reinforcement learning framework that measures task belief similarity using latent dynamics, improving rapid task identification and exploration in sparse reward environments.
Contribution
It proposes a novel latent task belief metric based on bisimulation-inspired similarity, enhancing task adaptation in Bayes-Adaptive MDPs.
Findings
Outperforms state-of-the-art methods on MuJoCo and panda-gym tasks
Effectively extracts common features of similar tasks
Enables efficient exploration in sparse reward settings
Abstract
Meta-reinforcement learning requires utilizing prior task distribution information obtained during exploration to rapidly adapt to unknown tasks. The efficiency of an agent's exploration hinges on accurately identifying the current task. Recent Bayes-Adaptive Deep RL approaches often rely on reconstructing the environment's reward signal, which is challenging in sparse reward settings, leading to suboptimal exploitation. Inspired by bisimulation metrics, which robustly extracts behavioral similarity in continuous MDPs, we propose SimBelief-a novel meta-RL framework via measuring similarity of task belief in Bayes-Adaptive MDP (BAMDP). SimBelief effectively extracts common features of similar task distributions, enabling efficient task identification and exploration in sparse reward environments. We introduce latent task belief metric to learn the common structure of similar tasks and…
Peer Reviews
Decision·ICLR 2025 Poster
- The proposed method is well formulated and clearly presented. - The effectiveness of the latent task belief metric is validated by theoretical guarantee. - Experiments demonstrate the superiority of the proposed method over strong baselines, especially the generalization capabilities to OOD testing tasks.
- The topic of online meta-RL is kind of old. - The core of the proposed method is using the reward and state transition functions $p(s’,r|s,a)$, or called world model, to measure task similarity in a latent space and hence infer task belief for context-based meta-RL. This kind of task inference has been investigated by existing works like VariBAD. - The baselines are kind of old, mainly 2019-2021.
This paper is generally well-constructed and well-motivated and provides some theoretical background on the proposed method. In addition, the paper has the potential to help the community understand the use of latent embedding for task generalization.
The paper includes many symbolic notations, which can confuse readers without strict and coherent representation. For example, in Fig 2, it seems $q_{\phi}$ outputs h, but the notation of Eq. (6) or others in the manuscript, $q_{\phi}$ outputs $z_r$. Similarly, some of the notations are used before proper definition, which hinders the readers from fully understanding the contents. The paper's reproducibility is questionable as it consists of various complex components for the framework.
* Introducing ideas of modeling task similarity to meta-RL is interesting and worth exploring * Sound experimental setup and results
The biggest weakness is that the proposed method lacks motivation and reasoning about how they achieve the claims the authors make. * Combining the variational task belief $z_r$ with their proposed task belief similarity $z_l$ through a Gaussian mixture. This does not combine both types of task representations, but instead samples from one or the other. Furthermore, $z_r$ already models dynamics information in order to reconstruct the trajectory, so it's unclear what information $z_l$ adds.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Reinforcement Learning in Robotics
