Infinite-Horizon Reinforcement Learning with Multinomial Logistic   Function Approximation

Jaehyun Park; Junyeop Kwon; Dabeen Lee

arXiv:2406.13633·cs.LG·October 15, 2024

Infinite-Horizon Reinforcement Learning with Multinomial Logistic Function Approximation

Jaehyun Park, Junyeop Kwon, Dabeen Lee

PDF

Open Access

TL;DR

This paper introduces a new efficient algorithm for infinite-horizon reinforcement learning with multinomial logistic function approximation, providing matching upper and lower regret bounds for both average and discounted reward settings.

Contribution

The paper develops a provably efficient value iteration-based algorithm for MNL-based RL and establishes tight regret bounds, advancing understanding of non-linear function approximation in RL.

Findings

01

Achieves regret bounds of D7(dDD7D7(T)) for average reward

02

Achieves regret bounds of D7(d(1-G)^{-2}D7D7(T)) for discounted reward

03

Provides several lower bounds matching the upper bounds, including for finite-horizon episodic MDPs

Abstract

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. We develop a provably efficient discounted value iteration-based algorithm that works for both infinite-horizon average-reward and discounted-reward settings. For average-reward communicating MDPs, the algorithm guarantees a regret upper bound of $\tilde{O} (d D T)$ where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. For discounted-reward MDPs, our algorithm achieves $\tilde{O} (d (1 - γ)^{- 2} T)$ regret where $γ$ is the discount factor. Then we complement these upper bounds by providing several regret lower bounds. We prove a lower bound of $Ω (d D T)$ for learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control · Scheduling and Optimization Algorithms