Automatic feature identification in least-squares policy iteration using the Koopman operator framework
Christian Mugisho Zagabe, Sebastian Peitz

TL;DR
This paper introduces a novel reinforcement learning algorithm that automatically learns features using a Koopman autoencoder, improving upon traditional methods by removing the need for fixed features or kernels.
Contribution
The paper presents the KAE-LSPI algorithm, which reformulates least-squares policy iteration through EDMD, enabling automatic feature learning without predefined kernels.
Findings
KAE-LSPI learns a reasonable number of features compared to classical LSPI.
Convergence to near-optimal policies is comparable to existing methods.
Empirical results demonstrate effective automatic feature learning.
Abstract
In this paper, we present a Koopman autoencoder-based least-squares policy iteration (KAE-LSPI) algorithm in reinforcement learning (RL). The KAE-LSPI algorithm is based on reformulating the so-called least-squares fixed-point approximation method in terms of extended dynamic mode decomposition (EDMD), thereby enabling automatic feature learning via the Koopman autoencoder (KAE) framework. The approach is motivated by the lack of a systematic choice of features or kernels in linear RL techniques. We compare the KAE-LSPI algorithm with two previous works, the classical least-squares policy iteration (LSPI) and the kernel-based least-squares policy iteration (KLSPI), using stochastic chain walk and inverted pendulum control problems as examples. Unlike previous works, no features or kernels need to be fixed a priori in our approach. Empirical results show the number of features learned by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
