Continuous-Time Fitted Value Iteration for Robust Policies

Michael Lutter; Boris Belousov; Shie Mannor; Dieter Fox; Animesh Garg,; Jan Peters

arXiv:2110.01954·cs.RO·October 6, 2021

Continuous-Time Fitted Value Iteration for Robust Policies

Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg,, Jan Peters

PDF

Open Access 1 Repo

TL;DR

This paper introduces continuous and robust fitted value iteration algorithms for solving Hamilton-Jacobi equations in continuous control, enabling optimal and robust policies without discretization, demonstrated on physical systems like pendulums and cartpoles.

Contribution

The paper develops closed-form solutions for optimal policies and adversaries in continuous control, simplifying the solution of Hamilton-Jacobi equations and enabling real-world robustness.

Findings

01

Algorithms achieve optimal policies in control tasks

02

Robust FVI outperforms deep RL in perturbation scenarios

03

Methods work effectively on physical systems like pendulums

Abstract

Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

milutter/value_iteration
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics