Critic Algorithms using Cooperative Networks

Debangshu Banerjee; Kavita Wagh

arXiv:2201.07839·cs.AI·January 21, 2022

Critic Algorithms using Cooperative Networks

Debangshu Banerjee, Kavita Wagh

PDF

Open Access

TL;DR

This paper introduces a new policy evaluation algorithm for Markov Decision Processes that converges faster empirically than existing methods, and demonstrates its effectiveness within deep reinforcement learning frameworks.

Contribution

The paper presents a gradient-based policy evaluation algorithm that tracks the Projected Bellman Error, differing from TD and residual algorithms, with improved convergence rates.

Findings

01

Empirically faster convergence than GTD2 algorithms.

02

Achieves comparable results in DQN and DDPG frameworks.

03

Effective in policy evaluation for reinforcement learning.

Abstract

An algorithm is proposed for policy evaluation in Markov Decision Processes which gives good empirical results with respect to convergence rates. The algorithm tracks the Projected Bellman Error and is implemented as a true gradient based algorithm. In this respect this algorithm differs from TD( $λ$ ) class of algorithms. This algorithm tracks the Projected Bellman Algorithm and is therefore different from the class of residual algorithms. Further the convergence of this algorithm is empirically much faster than GTD2 class of algorithms which aim at tracking the Projected Bellman Error. We implemented proposed algorithm in DQN and DDPG framework and found that our algorithm achieves comparable results in both of these experiments

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Opinion Dynamics and Social Influence

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Adam · Weight Decay · Batch Normalization · Dense Connections · Experience Replay · Convolution · Q-Learning · Deep Deterministic Policy Gradient · Deep Q-Network