Critic Algorithms using Cooperative Networks
Debangshu Banerjee, Kavita Wagh

TL;DR
This paper introduces a new policy evaluation algorithm for Markov Decision Processes that converges faster empirically than existing methods, and demonstrates its effectiveness within deep reinforcement learning frameworks.
Contribution
The paper presents a gradient-based policy evaluation algorithm that tracks the Projected Bellman Error, differing from TD and residual algorithms, with improved convergence rates.
Findings
Empirically faster convergence than GTD2 algorithms.
Achieves comparable results in DQN and DDPG frameworks.
Effective in policy evaluation for reinforcement learning.
Abstract
An algorithm is proposed for policy evaluation in Markov Decision Processes which gives good empirical results with respect to convergence rates. The algorithm tracks the Projected Bellman Error and is implemented as a true gradient based algorithm. In this respect this algorithm differs from TD() class of algorithms. This algorithm tracks the Projected Bellman Algorithm and is therefore different from the class of residual algorithms. Further the convergence of this algorithm is empirically much faster than GTD2 class of algorithms which aim at tracking the Projected Bellman Error. We implemented proposed algorithm in DQN and DDPG framework and found that our algorithm achieves comparable results in both of these experiments
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Opinion Dynamics and Social Influence
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Adam · Weight Decay · Batch Normalization · Dense Connections · Experience Replay · Convolution · Q-Learning · Deep Deterministic Policy Gradient · Deep Q-Network
