Representation Policy Iteration

Sridhar Mahadevan

arXiv:1207.1408·cs.AI·July 9, 2012·33 cites

Representation Policy Iteration

Sridhar Mahadevan

PDF

Open Access

TL;DR

This paper introduces a theoretically rigorous framework for automatically generating basis functions for value function approximation in large MDPs, leveraging Riemannian geometry and spectral analysis to improve policy learning.

Contribution

It presents a novel, coordinate-free method for creating basis functions using manifold theory, enabling automatic learning of representations and policies in MDPs.

Findings

01

RPI outperforms handcoded basis functions in experiments

02

Basis functions reflect the topology of the state space

03

Framework is compatible with existing approximate MDP solvers

Abstract

This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least squares policy iteration (LSPI). The key innovation is a coordinate-free representation of value functions, using the theory of smooth functions on a Riemannian manifold. Hodge theory yields a constructive method for generating basis functions for approximating value functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami) operator on manifolds. In effect, this approach performs a global Fourier analysis on the state space graph to approximate value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Model Reduction and Neural Networks