Model-Based Reinforcement Learning with Multinomial Logistic Function   Approximation

Taehyun Hwang; Min-hwan Oh

arXiv:2212.13540·stat.ML·November 19, 2024

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

Taehyun Hwang, Min-hwan Oh

PDF

Open Access 1 Video

TL;DR

This paper introduces a new model-based reinforcement learning algorithm for Markov decision processes with multinomial logistic transition models, providing theoretical guarantees and demonstrating superior empirical performance.

Contribution

It presents the first provably efficient RL algorithm for multinomial logistic transition models, extending beyond linear MDPs with rigorous regret bounds.

Findings

01

Achieves $ ilde{O}(d rac{ ext{H}^3}{ ext{T}})$ regret bound

02

Outperforms existing methods in numerical evaluations

03

Provides the first theoretical guarantees for this class of models

Abstract

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in analyzing algorithms in the linear MDP setting, the understanding of more general transition models is very restrictive. In this paper, we establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To balance the exploration-exploitation trade-off, we propose an upper confidence bound-based algorithm. We show that our proposed algorithm achieves $\tilde{O} (d H^{3} T)$ regret bound where $d$ is the dimension of the transition core, $H$ is the horizon, and $T$ is the total number of steps. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Simulation Techniques and Applications