Online Reinforcement Learning with Uncertain Episode Lengths
Debmalya Mandal, Goran Radanovic, Jiarui Gan, Adish Singla, Rupak, Majumdar

TL;DR
This paper introduces a framework for episodic reinforcement learning with uncertain episode lengths, linking it to general discounting, and proposes algorithms with regret bounds that adapt to this uncertainty.
Contribution
It establishes the equivalence between uncertain episode lengths and general discounting, and develops regret-minimizing algorithms that adapt to unknown episode length distributions.
Findings
Regret bounds are derived for various discounting schemes.
Algorithms perform well even with unknown episode length distributions.
Comparison shows advantages over traditional episodic RL methods.
Abstract
Existing episodic reinforcement algorithms assume that the length of an episode is fixed across time and known a priori. In this paper, we consider a general framework of episodic reinforcement learning when the length of each episode is drawn from a distribution. We first establish that this problem is equivalent to online reinforcement learning with general discounting where the learner is trying to optimize the expected discounted sum of rewards over an infinite horizon, but where the discounting function is not necessarily geometric. We show that minimizing regret with this new general discounting is equivalent to minimizing regret with uncertain episode lengths. We then design a reinforcement learning algorithm that minimizes regret with general discounting but acts for the setting with uncertain episode lengths. We instantiate our general bound for different types of discounting,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Decision-Making and Behavioral Economics
