MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

Zohar Rimon; Tom Jurgenson; Orr Krupnik; Gilad Adler; Aviv Tamar

arXiv:2403.09859·cs.LG·March 18, 2024·2 cites

MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

Zohar Rimon, Tom Jurgenson, Orr Krupnik, Gilad Adler, Aviv Tamar

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces MAMBA, a model-based meta-reinforcement learning approach that significantly improves sample efficiency and performance on benchmark domains, advancing towards real-world applicability.

Contribution

MAMBA combines state-of-the-art model-based and meta-RL techniques to enhance sample efficiency and scalability in meta-RL tasks, especially in high-dimensional environments.

Findings

01

Achieves up to 15x better sample efficiency.

02

Attains higher returns on benchmark domains.

03

Performs well with minimal hyperparameter tuning.

Abstract

Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration. Existing meta-RL algorithms are characterized by low sample efficiency, and mostly focus on low-dimensional task distributions. In parallel, model-based RL methods have been successful in solving partially observable MDPs, of which meta-RL is a special case. In this work, we leverage this success and propose a new model-based approach to meta-RL, based on elements from existing state-of-the-art model-based and meta-RL methods. We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to $15 \times$ ) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. The algorithm is sample efficient when compared to other meta-RL algorithms. 2. The authors conduct a good number of simulations to explain their algorithm, and evaluate its performance. 3. The paper is generally well written and easy to follow.

Weaknesses

**The assumption of task decomposability and task independence is strong, vague, and confusing** The paper assumes scenarios of task decomposability, where each task is decomposed into independent tasks, I think this is a pretty strong assumption, and not many environments will satisfy this criteria. The example quoted for task decomposability by the authors is a little confusing, the authors provide an example of a robot being required to solve several independent problems in a sequence. Isn

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

*Clarity* - This work is well-written and easy to understand. The problem this paper is attempting to solve, and its proposed algorithm are clearly presented, which makes it easier to reproduce. *Originality and Significance* - The proposed changes appear to be fairly minor: augmenting the state with these additional observations is something that happens in a fair number of other papers; and increasing the history length for computing the context variable is akin to increasing a hyperparameter

Weaknesses

I think this work already provides valuable contributions, but can primarily be strengthened significantly by shedding more light on its results. - MAMBA proposes 3 changes over Dreamer. It would be very helpful to perform an ablation study on these 3 changes and understand which ones are most important and how they impact performance. - This work finds that Dreamer performs better than VariBAD and HyperX out-of-the-box on meta-RL tasks. I find the paper's claim that this performance gap probabl

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

Based on the evidence provided, it is clear that within the domains examined, Mamba is a stronger meta RL-algorithm than those tested against both in terms of returns and sample efficiency. The results are clearly presented and the explanations are generally sufficient.

Weaknesses

While overall the paper is good, and provides good evidence within the context, it seems that there are a number of approaches within the literature which have not been covered. These include: 1) Pinon et al, A model-based approach to meta-reinforcement learning:transformers and tree search, https://arxiv.org/pdf/2208.11535.pdf 2) Wang and Hoof, Model-based meta reinforcement learning using graph structured surrogate models and amortized policy search, https://proceedings.mlr.press/v162/wang22z

Code & Models

Repositories

zoharri/mamba
pytorchOfficial

Videos

MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning· slideslive

Taxonomy

TopicsMental Health Research Topics

MethodsFocus