Representation Policy Iteration
Sridhar Mahadevan

TL;DR
This paper introduces a theoretically rigorous framework for automatically generating basis functions for value function approximation in large MDPs, leveraging Riemannian geometry and spectral analysis to improve policy learning.
Contribution
It presents a novel, coordinate-free method for creating basis functions using manifold theory, enabling automatic learning of representations and policies in MDPs.
Findings
RPI outperforms handcoded basis functions in experiments
Basis functions reflect the topology of the state space
Framework is compatible with existing approximate MDP solvers
Abstract
This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least squares policy iteration (LSPI). The key innovation is a coordinate-free representation of value functions, using the theory of smooth functions on a Riemannian manifold. Hodge theory yields a constructive method for generating basis functions for approximating value functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami) operator on manifolds. In effect, this approach performs a global Fourier analysis on the state space graph to approximate value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Model Reduction and Neural Networks
