State-free Reinforcement Learning

Mingyu Chen; Aldo Pacchiano; Xuezhou Zhang

arXiv:2409.18439·cs.LG·September 30, 2024

State-free Reinforcement Learning

Mingyu Chen, Aldo Pacchiano, Xuezhou Zhang

PDF

Open Access 1 Reviews

TL;DR

This paper introduces a state-free reinforcement learning algorithm that operates without prior knowledge of the environment's state space, aiming for hyper-parameter free RL.

Contribution

It presents the first algorithm for state-free RL with regret bounds independent of the entire state space, advancing towards parameter-free RL.

Findings

01

Regret depends only on the reachable state set, not the entire state space.

02

Algorithm requires no prior state information.

03

Progress towards hyper-parameter free reinforcement learning.

Abstract

In this work, we study the \textit{state-free RL} problem, where the algorithm does not have the states information before interacting with the environment. Specifically, denote the reachable state set by $S^{Π} := {s ∣ max_{π \in Π} q^{P, π} (s) > 0}$ , we design an algorithm which requires no information on the state space $S$ while having a regret that is completely independent of $S$ and only depend on $S^{Π}$ . We view this as a concrete first step towards \textit{parameter-free RL}, with the goal of designing RL algorithms that require no hyper-parameter tuning.

Peer Reviews

Decision·NeurIPS 2024 poster

Reviewer 01Rating 7Confidence 4

Strengths

* The algorithmic solution is quite elegant since it can be applied to any "basic" RL algorithm with regret guarantees. * The final result achieves the desired removal of the dependency on S, which is replaced by the size of the reachable states. * The result holds for both stochastic and adversarial settings and it can be extended to removing the dependency on the horizon H as well.

Weaknesses

* I would encourage the authors to provide a clean comparison of the final bounds in the stochastic setting with the best available bounds. In particular, I'm wondering whether the restart leads to extra log terms. * Related the previous point, I suggest the authors to make explicit the bounds for simple doubling trick strategies, so as to have a point of comparison. * What is exactly the role of epsilon? It looks like it can be directly set to 0 and everything works the same. Additional refere

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Reinforcement Learning in Robotics

MethodsSparse Evolutionary Training