Control with adaptive Q-learning
Jo\~ao Pedro Ara\'ujo, M\'ario A. T. Figueiredo, Miguel Ayala, Botto

TL;DR
This paper introduces adaptive and single-partition adaptive Q-learning algorithms for efficient, interpretable, and sample-efficient model-free reinforcement learning in control problems, demonstrating superior performance and interpretability over standard methods.
Contribution
The paper proposes SPAQL and SPAQL-TS algorithms that learn time-invariant policies with improved sample efficiency and interpretability for control tasks.
Findings
SPAQL-TS successfully solves the Cartpole problem.
SPAQL outperforms TRPO in sample efficiency.
Policies learned are interpretable and effective.
Abstract
This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning (SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two classical control problems (Pendulum and Cartpole). AQL adaptively partitions the state-action space of a Markov decision process (MDP), while learning the control policy, i. e., the mapping from states to actions. The main difference between AQL and SPAQL is that the latter learns time-invariant policies, where the mapping from states to actions does not depend explicitly on the time step. This paper also proposes the SPAQL with terminal state (SPAQL-TS), an improved version of SPAQL tailored for the design of regulators for control problems. The time-invariant policies are shown to result in a better performance than the time-variant ones in both problems studied. These algorithms are particularly fitted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Control Systems Optimization
MethodsTrust Region Policy Optimization · Q-Learning
