Costate-focused models for reinforcement learning

Bita Behrouzi; Xuefei Liu; Douglas Tweed

arXiv:1711.05817·cs.LG·October 4, 2018·1 cites

Costate-focused models for reinforcement learning

Bita Behrouzi, Xuefei Liu, Douglas Tweed

PDF

Open Access

TL;DR

This paper introduces a costate-based reinforcement learning approach that leverages models of state dynamics and the costate equation to improve policy learning and focus on task-relevant environment features, excelling in time-optimal control tasks.

Contribution

It presents a novel costate-focused method for reinforcement learning that uses environment models and the costate equation, offering advantages over traditional Bellman-based algorithms.

Findings

01

Effective in solving difficult time-optimal control problems

02

Outperforms deep deterministic policy gradient on tested tasks

03

Can learn from mental practice through environment modeling

Abstract

Many recent algorithms for reinforcement learning are model-free and founded on the Bellman equation. Here we present a method founded on the costate equation and models of the state dynamics. We use the costate -- the gradient of cost with respect to state -- to improve the policy and also to "focus" the model, training it to detect and mimic those features of the environment that are most relevant to its task. We show that this method can handle difficult time-optimal control problems, driving deterministic or stochastic mechanical systems quickly to a target. On these tasks it works well compared to deep deterministic policy gradient, a recent Bellman method. And because it creates a model, the costate method can also learn from mental practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications