Randomized Exploration for Reinforcement Learning with Multinomial   Logistic Function Approximation

Wooseong Cho; Taehyun Hwang; Joongkyu Lee; Min-hwan Oh

arXiv:2405.20165·stat.ML·November 1, 2024

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation

Wooseong Cho, Taehyun Hwang, Joongkyu Lee, Min-hwan Oh

PDF

Open Access 1 Video

TL;DR

This paper introduces efficient randomized algorithms for reinforcement learning with multinomial logistic transition models, providing theoretical regret guarantees and demonstrating superior empirical performance.

Contribution

The paper develops the first randomized RL algorithms with constant-time per-episode computation for MNL transition models, with provable regret bounds.

Findings

01

RRL-MNL achieves a regret of old rac{1}{\u03ba} d^{3/2} H^{3/2} sqrt{T}

02

ORRL-MNL improves regret dependence on rac{1}{ba} with an additional term

03

Numerical experiments show the algorithms outperform existing methods in practice

Abstract

We study reinforcement learning with multinomial logistic (MNL) function approximation where the underlying transition probability kernel of the Markov decision processes (MDPs) is parametrized by an unknown transition core with features of state and action. For the finite horizon episodic setting with inhomogeneous state transitions, we propose provably efficient algorithms with randomized exploration having frequentist regret guarantees. For our first algorithm, $RRL-MNL$ , we adapt optimistic sampling to ensure the optimism of the estimated value function with sufficient frequency. We establish that $RRL-MNL$ achieves a $\tilde{O} (κ^{- 1} d^{\frac{3}{2}} H^{\frac{3}{2}} T)$ frequentist regret bound with constant-time computational cost per episode. Here, $d$ is the dimension of the transition core, $H$ is the horizon length, $T$ is the total number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation· slideslive

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics · Metaheuristic Optimization Algorithms Research