Optimistic Q-learning for average reward and episodic reinforcement learning

Priyank Agrawal; Shipra Agrawal

arXiv:2407.13743·cs.LG·June 17, 2025

Optimistic Q-learning for average reward and episodic reinforcement learning

Priyank Agrawal, Shipra Agrawal

PDF

Open Access

TL;DR

This paper introduces an optimistic Q-learning algorithm for average reward reinforcement learning that generalizes episodic settings, achieves regret bounds, and employs a novel operator with contraction properties.

Contribution

It presents a new optimistic Q-learning method for average reward RL under a relaxed assumption, introducing the arL operator with contraction properties, unifying episodic and non-episodic analysis.

Findings

01

Regret bound of H^5 S T

02

arL operator has strict contraction in span

03

Algorithm generalizes episodic and average reward settings

Abstract

We present an optimistic Q-learning algorithm for regret minimization in average reward reinforcement learning under an additional assumption on the underlying MDP that for all policies, the time to visit some frequent state $s_{0}$ is finite and upper bounded by $H$ , either in expectation or with constant probability. Our setting strictly generalizes the episodic setting and is significantly less restrictive than the assumption of bounded hitting time \textit{for all states} made by most previous literature on model-free algorithms in average reward settings. We demonstrate a regret bound of $\tilde{O} (H^{5} S A T)$ , where $S$ and $A$ are the numbers of states and actions, and $T$ is the horizon. A key technical novelty of our work is the introduction of an $\overline{L}$ operator defined as $\overline{L} v = \frac{1}{H} \sum_{h = 1}^{H} L^{h} v$ where $L$ denotes the Bellman operator.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics

MethodsQ-Learning