Costate-focused models for reinforcement learning
Bita Behrouzi, Xuefei Liu, Douglas Tweed

TL;DR
This paper introduces a costate-based reinforcement learning approach that leverages models of state dynamics and the costate equation to improve policy learning and focus on task-relevant environment features, excelling in time-optimal control tasks.
Contribution
It presents a novel costate-focused method for reinforcement learning that uses environment models and the costate equation, offering advantages over traditional Bellman-based algorithms.
Findings
Effective in solving difficult time-optimal control problems
Outperforms deep deterministic policy gradient on tested tasks
Can learn from mental practice through environment modeling
Abstract
Many recent algorithms for reinforcement learning are model-free and founded on the Bellman equation. Here we present a method founded on the costate equation and models of the state dynamics. We use the costate -- the gradient of cost with respect to state -- to improve the policy and also to "focus" the model, training it to detect and mimic those features of the environment that are most relevant to its task. We show that this method can handle difficult time-optimal control problems, driving deterministic or stochastic mechanical systems quickly to a target. On these tasks it works well compared to deep deterministic policy gradient, a recent Bellman method. And because it creates a model, the costate method can also learn from mental practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications
