Model-based Meta Reinforcement Learning using Graph Structured Surrogate   Models

Qi Wang; Herke van Hoof

arXiv:2102.08291·cs.LG·February 17, 2021·1 cites

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

Qi Wang, Herke van Hoof

PDF

Open Access

TL;DR

This paper introduces a graph-structured surrogate model for model-based meta reinforcement learning, improving dynamics prediction and enabling fast, high-return decision-making across tasks.

Contribution

It proposes a novel GSSM that enhances dynamics modeling and integrates a Thompson-sampling approach for efficient meta RL.

Findings

01

GSSM outperforms existing models in environment dynamics prediction.

02

The approach achieves high returns without test-time policy gradient optimization.

03

It enables fast deployment with improved generalization across tasks.

Abstract

Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications. Model-based meta reinforcement learning addresses these issues by learning dynamics and leveraging knowledge from prior experience. In this paper, we take a closer look at this framework, and propose a new Thompson-sampling based approach that consists of a new model to identify task dynamics together with an amortized policy optimization step. We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics. Additionally, our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications