Goal-Conditioned Q-Learning as Knowledge Distillation

Alexander Levine; Soheil Feizi

arXiv:2208.13298·cs.LG·March 9, 2023

Goal-Conditioned Q-Learning as Knowledge Distillation

Alexander Levine, Soheil Feizi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel approach using knowledge distillation to improve goal-conditioned reinforcement learning, especially in high-dimensional goal spaces, demonstrating efficiency gains and better performance.

Contribution

It applies Gradient-Based Attention Transfer to goal-conditioned Q-learning, providing empirical improvements and theoretical insights into sample efficiency in high-dimensional settings.

Findings

01

Improved performance in high-dimensional goal spaces.

02

Efficient learning with multiple sparse goals.

03

Theoretical reduction in required replay buffer transitions.

Abstract

Many applications of reinforcement learning can be formalized as goal-conditioned environments, where, in each episode, there is a "goal" that affects the rewards obtained during that episode but does not affect the dynamics. Various techniques have been proposed to improve performance in goal-conditioned environments, such as automatic curriculum generation and goal relabeling. In this work, we explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. In particular: the current Q-value function and the target Q-value estimate are both functions of the goal, and we would like to train the Q-value function to match its target for all goals. We therefore apply Gradient-Based Attention Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to the Q-function update. We empirically show that this can improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alevine0/reengage
tfOfficial

Videos

Goal-Conditioned Q-Learning as Knowledge Distillation· underline

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsTest · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Weight Decay · Adam · Convolution · Dense Connections · Experience Replay · Deep Deterministic Policy Gradient · Knowledge Distillation