DisCor: Corrective Feedback in Reinforcement Learning via Distribution   Correction

Aviral Kumar; Abhishek Gupta; Sergey Levine

arXiv:2003.07305·cs.LG·March 17, 2020·36 cites

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

PDF

Open Access 4 Repos 1 Video

TL;DR

DisCor introduces a distribution correction method for reinforcement learning that improves stability and performance by re-weighting training data, addressing issues caused by distribution mismatch and feedback loops.

Contribution

The paper proposes DisCor, a novel algorithm that approximates an optimal data distribution correction to enhance RL training stability and effectiveness.

Findings

01

DisCor significantly improves learning in noisy and sparse reward environments.

02

Theoretical analysis shows distribution correction reduces instability in Q-learning.

03

Empirical results demonstrate better convergence and performance across multiple tasks.

Abstract

Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. When using standard supervised methods (e.g., for bandits), on-policy data collection provides "hard negatives" that correct the model in precisely those states and actions that the policy is likely to visit. We call this phenomenon "corrective feedback." We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from this corrective feedback, and training on the experience collected by the algorithm is not sufficient to correct errors in the Q-function. In fact, Q-learning and related methods can exhibit pathological interactions between the distribution of experience collected by the agent and the policy induced by training on that experience, leading to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Adaptive Dynamic Programming Control

MethodsQ-Learning