Provably Efficient Reinforcement Learning with Multinomial Logit   Function Approximation

Long-Fei Li; Yu-Jie Zhang; Peng Zhao; Zhi-Hua Zhou

arXiv:2405.17061·cs.LG·January 17, 2025

Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation

Long-Fei Li, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou

PDF

Open Access

TL;DR

This paper introduces a new reinforcement learning algorithm using multinomial logit function approximation that is both statistically efficient and computationally practical, with optimal regret bounds and no dependence on certain problem-dependent parameters.

Contribution

It proposes a novel RL algorithm with improved regret bounds that remove dependence on the inverse of a problem-dependent quantity, and introduces a computationally efficient version.

Findings

01

Achieves regret of tilde{} (dH^2 sqrt{K} + \u007f^{-1}d^2H^2)

02

Eliminates dependence on ^{-1} in the dominant regret term

03

Provides the first lower bound for this class of problems

Abstract

We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises substantial challenges in both statistical and computational efficiency. The best-known result of Hwang and Oh [2023] has achieved an $O (κ^{- 1} d H^{2} K)$ regret upper bound, where $κ$ is a problem-dependent quantity, $d$ is the feature dimension, $H$ is the episode length, and $K$ is the number of episodes. However, we observe that $κ^{- 1}$ exhibits polynomial dependence on the number of reachable states, which can be as large as the state space size in the worst case and thus undermines the motivation for function approximation. Additionally, their method requires storing all historical data and the time complexity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control