Gradient-Aware Model-based Policy Search
Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo, Papini, Marcello Restelli

TL;DR
This paper introduces GAMPS, a novel model-based policy search method that learns environment models focused on policy-relevant regions, improving policy optimization by weighting model errors based on their impact on the policy gradient.
Contribution
The paper proposes a new approach that explicitly considers the policy's influence on model learning, enhancing model-based reinforcement learning in misspecified environments.
Findings
GAMPS outperforms traditional methods on benchmark domains.
The weighting scheme improves model accuracy for policy improvement.
Empirical results validate the effectiveness of policy-aware model learning.
Abstract
Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
