Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning
Parth Chadha

TL;DR
This paper introduces a gradient matching method that leverages domain knowledge from a dynamics predictor to enhance sample efficiency in reinforcement learning, combining model-based and model-free approaches effectively.
Contribution
It proposes a novel gradient matching algorithm that integrates domain knowledge into model-free RL to improve sample efficiency and reduce bias.
Findings
Improved sample efficiency demonstrated in experiments
Effective integration of model-based and model-free learning
Reduced asymptotic bias in reinforcement learning
Abstract
Model-free deep reinforcement learning (RL) agents can learn an effective policy directly from repeated interactions with a black-box environment. However in practice, the algorithms often require large amounts of training experience to learn and generalize well. In addition, classic model-free learning ignores the domain information contained in the state transition tuples. Model-based RL, on the other hand, attempts to learn a model of the environment from experience and is substantially more sample efficient, but suffers from significantly large asymptotic bias owing to the imperfect dynamics model. In this paper, we propose a gradient matching algorithm to improve sample efficiency by utilizing target slope information from the dynamics predictor to aid the model-free learner. We demonstrate this by presenting a technique for matching the gradient information from the model-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Domain Adaptation and Few-Shot Learning
