A Differential Dynamic Programming Framework for Inverse Reinforcement   Learning

Kun Cao; Xinhang Xu; Wanxin Jin; Karl H. Johansson; Lihua Xie

arXiv:2407.19902·cs.RO·July 30, 2024

A Differential Dynamic Programming Framework for Inverse Reinforcement Learning

Kun Cao, Xinhang Xu, Wanxin Jin, Karl H. Johansson, Lihua Xie

PDF

Open Access

TL;DR

This paper introduces a DDP-based inverse reinforcement learning framework that efficiently recovers cost function parameters, dynamics, and constraints from demonstrations, with a novel closed-loop loss function outperforming traditional open-loop methods.

Contribution

It presents a new DDP-based IRL method that handles both equality and inequality constraints and introduces a closed-loop IRL framework capturing demonstration dynamics.

Findings

01

Validated through robot and quadrotor experiments

02

Proven to recover parameters under certain conditions

03

Closed-loop IRL outperforms open-loop loss functions

Abstract

A differential dynamic programming (DDP)-based framework for inverse reinforcement learning (IRL) is introduced to recover the parameters in the cost function, system dynamics, and constraints from demonstrations. Different from existing work, where DDP was used for the inner forward problem with inequality constraints, our proposed framework uses it for efficient computation of the gradient required in the outer inverse problem with equality and inequality constraints. The equivalence between the proposed method and existing methods based on Pontryagin's Maximum Principle (PMP) is established. More importantly, using this DDP-based IRL with an open-loop loss function, a closed-loop IRL framework is presented. In this framework, a loss function is proposed to capture the closed-loop nature of demonstrations. It is shown to be better than the commonly used open-loop loss function. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control