Control with adaptive Q-learning

Jo\~ao Pedro Ara\'ujo; M\'ario A. T. Figueiredo; Miguel Ayala; Botto

arXiv:2011.02141·cs.LG·November 5, 2020·1 cites

Control with adaptive Q-learning

Jo\~ao Pedro Ara\'ujo, M\'ario A. T. Figueiredo, Miguel Ayala, Botto

PDF

Open Access 1 Repo

TL;DR

This paper introduces adaptive and single-partition adaptive Q-learning algorithms for efficient, interpretable, and sample-efficient model-free reinforcement learning in control problems, demonstrating superior performance and interpretability over standard methods.

Contribution

The paper proposes SPAQL and SPAQL-TS algorithms that learn time-invariant policies with improved sample efficiency and interpretability for control tasks.

Findings

01

SPAQL-TS successfully solves the Cartpole problem.

02

SPAQL outperforms TRPO in sample efficiency.

03

Policies learned are interpretable and effective.

Abstract

This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning (SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two classical control problems (Pendulum and Cartpole). AQL adaptively partitions the state-action space of a Markov decision process (MDP), while learning the control policy, i. e., the mapping from states to actions. The main difference between AQL and SPAQL is that the latter learns time-invariant policies, where the mapping from states to actions does not depend explicitly on the time step. This paper also proposes the SPAQL with terminal state (SPAQL-TS), an improved version of SPAQL tailored for the design of regulators for control problems. The time-invariant policies are shown to result in a better performance than the time-variant ones in both problems studied. These algorithms are particularly fitted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaraujo98/SinglePartitionAdaptiveQLearning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Control Systems Optimization

MethodsTrust Region Policy Optimization · Q-Learning