Gradient-Aware Model-based Policy Search

Pierluca D'Oro; Alberto Maria Metelli; Andrea Tirinzoni; Matteo; Papini; Marcello Restelli

arXiv:1909.04115·cs.LG·October 20, 2020

Gradient-Aware Model-based Policy Search

Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo, Papini, Marcello Restelli

PDF

TL;DR

This paper introduces GAMPS, a novel model-based policy search method that learns environment models focused on policy-relevant regions, improving policy optimization by weighting model errors based on their impact on the policy gradient.

Contribution

The paper proposes a new approach that explicitly considers the policy's influence on model learning, enhancing model-based reinforcement learning in misspecified environments.

Findings

01

GAMPS outperforms traditional methods on benchmark domains.

02

The weighting scheme improves model accuracy for policy improvement.

03

Empirical results validate the effectiveness of policy-aware model learning.

Abstract

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.