Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
Taehyun Hwang, Min-hwan Oh

TL;DR
This paper introduces a new model-based reinforcement learning algorithm for Markov decision processes with multinomial logistic transition models, providing theoretical guarantees and demonstrating superior empirical performance.
Contribution
It presents the first provably efficient RL algorithm for multinomial logistic transition models, extending beyond linear MDPs with rigorous regret bounds.
Findings
Achieves $ ilde{O}(d rac{ ext{H}^3}{ ext{T}})$ regret bound
Outperforms existing methods in numerical evaluations
Provides the first theoretical guarantees for this class of models
Abstract
We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in analyzing algorithms in the linear MDP setting, the understanding of more general transition models is very restrictive. In this paper, we establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To balance the exploration-exploitation trade-off, we propose an upper confidence bound-based algorithm. We show that our proposed algorithm achieves regret bound where is the dimension of the transition core, is the horizon, and is the total number of steps. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Simulation Techniques and Applications
