DIRECT: Learning from Sparse and Shifting Rewards using Discriminative   Reward Co-Training

Philipp Altmann; Thomy Phan; Fabian Ritz; Thomas Gabor; Claudia; Linnhoff-Popien

arXiv:2301.07421·cs.LG·January 19, 2023·1 cites

DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training

Philipp Altmann, Thomy Phan, Fabian Ritz, Thomas Gabor, Claudia, Linnhoff-Popien

PDF

Open Access

TL;DR

DIRECT introduces a discriminative reward co-training method that leverages beneficial past trajectories and a discriminator network to improve reinforcement learning in environments with sparse and shifting rewards.

Contribution

The paper presents a novel discriminative reward co-training approach that enhances deep reinforcement learning by using a discriminator to guide policy optimization with beneficial past experiences.

Findings

01

Outperforms state-of-the-art algorithms in sparse-reward environments

02

Effectively handles shifting reward landscapes

03

Provides a surrogate reward to improve policy learning

Abstract

We propose discriminative reward co-training (DIRECT) as an extension to deep reinforcement learning algorithms. Building upon the concept of self-imitation learning (SIL), we introduce an imitation buffer to store beneficial trajectories generated by the policy determined by their return. A discriminator network is trained concurrently to the policy to distinguish between trajectories generated by the current policy and beneficial trajectories generated by previous policies. The discriminator's verdict is used to construct a reward signal for optimizing the policy. By interpolating prior experience, DIRECT is able to act as a surrogate, steering policy optimization towards more valuable regions of the reward landscape thus learning an optimal policy. Our results show that DIRECT outperforms state-of-the-art algorithms in sparse- and shifting-reward environments being able to provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Mental Health Interventions · EEG and Brain-Computer Interfaces · Functional Brain Connectivity Studies