Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning

Menglong Zhang; Fuyuan Qian

arXiv:2506.19785·cs.AI·June 25, 2025

Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning

Menglong Zhang, Fuyuan Qian

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces SimBelief, a meta-reinforcement learning framework that measures task belief similarity using latent dynamics, improving rapid task identification and exploration in sparse reward environments.

Contribution

It proposes a novel latent task belief metric based on bisimulation-inspired similarity, enhancing task adaptation in Bayes-Adaptive MDPs.

Findings

01

Outperforms state-of-the-art methods on MuJoCo and panda-gym tasks

02

Effectively extracts common features of similar tasks

03

Enables efficient exploration in sparse reward settings

Abstract

Meta-reinforcement learning requires utilizing prior task distribution information obtained during exploration to rapidly adapt to unknown tasks. The efficiency of an agent's exploration hinges on accurately identifying the current task. Recent Bayes-Adaptive Deep RL approaches often rely on reconstructing the environment's reward signal, which is challenging in sparse reward settings, leading to suboptimal exploitation. Inspired by bisimulation metrics, which robustly extracts behavioral similarity in continuous MDPs, we propose SimBelief-a novel meta-RL framework via measuring similarity of task belief in Bayes-Adaptive MDP (BAMDP). SimBelief effectively extracts common features of similar task distributions, enabling efficient task identification and exploration in sparse reward environments. We introduce latent task belief metric to learn the common structure of similar tasks and…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 5

Strengths

- The proposed method is well formulated and clearly presented. - The effectiveness of the latent task belief metric is validated by theoretical guarantee. - Experiments demonstrate the superiority of the proposed method over strong baselines, especially the generalization capabilities to OOD testing tasks.

Weaknesses

- The topic of online meta-RL is kind of old. - The core of the proposed method is using the reward and state transition functions $p(s’,r|s,a)$, or called world model, to measure task similarity in a latent space and hence infer task belief for context-based meta-RL. This kind of task inference has been investigated by existing works like VariBAD. - The baselines are kind of old, mainly 2019-2021.

Reviewer 02Rating 6Confidence 3

Strengths

This paper is generally well-constructed and well-motivated and provides some theoretical background on the proposed method. In addition, the paper has the potential to help the community understand the use of latent embedding for task generalization.

Weaknesses

The paper includes many symbolic notations, which can confuse readers without strict and coherent representation. For example, in Fig 2, it seems $q_{\phi}$ outputs h, but the notation of Eq. (6) or others in the manuscript, $q_{\phi}$ outputs $z_r$. Similarly, some of the notations are used before proper definition, which hinders the readers from fully understanding the contents. The paper's reproducibility is questionable as it consists of various complex components for the framework.

Reviewer 03Rating 6Confidence 4

Strengths

* Introducing ideas of modeling task similarity to meta-RL is interesting and worth exploring * Sound experimental setup and results

Weaknesses

The biggest weakness is that the proposed method lacks motivation and reasoning about how they achieve the claims the authors make. * Combining the variational task belief $z_r$ with their proposed task belief similarity $z_l$ through a Gaussian mixture. This does not combine both types of task representations, but instead samples from one or the other. Furthermore, $z_r$ already models dynamics information in order to reconstruct the trajectory, so it's unclear what information $z_l$ adds.

Code & Models

Repositories

mlzhang-pr/simbelief
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Reinforcement Learning in Robotics