Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
Tejas Pagare, Vivek Borkar, Konstantin Avrachenkov

TL;DR
This paper extends the Full Gradient DQN algorithm to average reward Markov decision processes, compares it with existing methods, and demonstrates improved convergence rates across various tasks.
Contribution
It introduces a provably convergent Full Gradient DQN for average reward problems and applies it to learn Whittle indices for restless bandits.
Findings
Full Gradient DQN shows better convergence rates.
Experimental comparison favors Full Gradient DQN over RVI Q-Learning.
Extension to Whittle indices for multi-armed bandits.
Abstract
We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics
MethodsDense Connections · Convolution · Q-Learning · Deep Q-Network
