Goal-Conditioned Q-Learning as Knowledge Distillation
Alexander Levine, Soheil Feizi

TL;DR
This paper introduces a novel approach using knowledge distillation to improve goal-conditioned reinforcement learning, especially in high-dimensional goal spaces, demonstrating efficiency gains and better performance.
Contribution
It applies Gradient-Based Attention Transfer to goal-conditioned Q-learning, providing empirical improvements and theoretical insights into sample efficiency in high-dimensional settings.
Findings
Improved performance in high-dimensional goal spaces.
Efficient learning with multiple sparse goals.
Theoretical reduction in required replay buffer transitions.
Abstract
Many applications of reinforcement learning can be formalized as goal-conditioned environments, where, in each episode, there is a "goal" that affects the rewards obtained during that episode but does not affect the dynamics. Various techniques have been proposed to improve performance in goal-conditioned environments, such as automatic curriculum generation and goal relabeling. In this work, we explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. In particular: the current Q-value function and the target Q-value estimate are both functions of the goal, and we would like to train the Q-value function to match its target for all goals. We therefore apply Gradient-Based Attention Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to the Q-function update. We empirically show that this can improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsTest · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Weight Decay · Adam · Convolution · Dense Connections · Experience Replay · Deep Deterministic Policy Gradient · Knowledge Distillation
