Least Squares Temporal Difference Actor-Critic Methods with Applications   to Robot Motion Control

Reza Moazzez Estanjini; Xu Chu Ding; Morteza Lahijanian; Jing; Wang; Calin A. Belta; Ioannis Ch. Paschalidis

arXiv:1108.4698·cs.RO·August 31, 2011·1 cites

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control

Reza Moazzez Estanjini, Xu Chu Ding, Morteza Lahijanian, Jing, Wang, Calin A. Belta, Ioannis Ch. Paschalidis

PDF

Open Access

TL;DR

This paper introduces a novel actor-critic algorithm using least squares temporal difference learning for solving probabilistic control problems in robotics, specifically for maximizing reachability while avoiding certain states.

Contribution

It transforms a probabilistic robot motion control problem into a stochastic shortest path problem and develops a new approximate dynamic programming method with proven convergence.

Findings

01

Algorithm effectively finds policies in simulated robot scenarios.

02

Convergence to stationary points demonstrated.

03

Simulation confirms practical applicability.

Abstract

We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Control Systems Optimization