Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

Tejas Pagare; Vivek Borkar; Konstantin Avrachenkov

arXiv:2304.03729·eess.SY·April 10, 2023·1 cites

Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

Tejas Pagare, Vivek Borkar, Konstantin Avrachenkov

PDF

Open Access

TL;DR

This paper extends the Full Gradient DQN algorithm to average reward Markov decision processes, compares it with existing methods, and demonstrates improved convergence rates across various tasks.

Contribution

It introduces a provably convergent Full Gradient DQN for average reward problems and applies it to learn Whittle indices for restless bandits.

Findings

01

Full Gradient DQN shows better convergence rates.

02

Experimental comparison favors Full Gradient DQN over RVI Q-Learning.

03

Extension to Whittle indices for multi-armed bandits.

Abstract

We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics

MethodsDense Connections · Convolution · Q-Learning · Deep Q-Network