Domain Knowledge Integration By Gradient Matching For Sample-Efficient   Reinforcement Learning

Parth Chadha

arXiv:2005.13778·cs.LG·May 29, 2020

Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning

Parth Chadha

PDF

Open Access

TL;DR

This paper introduces a gradient matching method that leverages domain knowledge from a dynamics predictor to enhance sample efficiency in reinforcement learning, combining model-based and model-free approaches effectively.

Contribution

It proposes a novel gradient matching algorithm that integrates domain knowledge into model-free RL to improve sample efficiency and reduce bias.

Findings

01

Improved sample efficiency demonstrated in experiments

02

Effective integration of model-based and model-free learning

03

Reduced asymptotic bias in reinforcement learning

Abstract

Model-free deep reinforcement learning (RL) agents can learn an effective policy directly from repeated interactions with a black-box environment. However in practice, the algorithms often require large amounts of training experience to learn and generalize well. In addition, classic model-free learning ignores the domain information contained in the state transition tuples. Model-based RL, on the other hand, attempts to learn a model of the environment from experience and is substantially more sample efficient, but suffers from significantly large asymptotic bias owing to the imperfect dynamics model. In this paper, we propose a gradient matching algorithm to improve sample efficiency by utilizing target slope information from the dynamics predictor to aid the model-free learner. We demonstrate this by presenting a technique for matching the gradient information from the model-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Domain Adaptation and Few-Shot Learning