State Action Separable Reinforcement Learning

Ziyao Zhang; Liang Ma; Kin K. Leung; Konstantinos Poularakis; and Mudhakar Srivatsa

arXiv:2006.03713·cs.LG·June 9, 2020

State Action Separable Reinforcement Learning

Ziyao Zhang, Liang Ma, Kin K. Leung, Konstantinos Poularakis, and Mudhakar Srivatsa

PDF

Open Access

TL;DR

This paper introduces State Action Separable Reinforcement Learning (sasRL), a new paradigm that decouples actions from value learning, improving efficiency and convergence in complex decision-making tasks.

Contribution

The paper proposes sasRL, a novel RL framework that separates actions from value functions and incorporates a transition model for enhanced learning efficiency.

Findings

01

sasRL outperforms traditional RL algorithms by up to 75% in gaming scenarios.

02

Convergence time of sasRL is theoretically analyzed as $O(T^{1/k})$, showing faster convergence under certain conditions.

03

Decoupling actions from value functions reduces the complexity of the learning process.

Abstract

Reinforcement Learning (RL) based methods have seen their paramount successes in solving serial decision-making and control problems in recent years. For conventional RL formulations, Markov Decision Process (MDP) and state-action-value function are the basis for the problem modeling and policy evaluation. However, several challenging issues still remain. Among most cited issues, the enormity of state/action space is an important factor that causes inefficiency in accurately approximating the state-action-value function. We observe that although actions directly define the agents' behaviors, for many problems the next state after a state transition matters more than the action taken, in determining the return of such a state transition. In this regard, we propose a new learning paradigm, State Action Separable Reinforcement Learning (sasRL), wherein the action space is decoupled from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Supply Chain and Inventory Management