Smooth Imitation Learning via Smooth Costs and Smooth Policies

Sapana Chaudhary; Balaraman Ravindran

arXiv:2111.02354·cs.LG·November 4, 2021

Smooth Imitation Learning via Smooth Costs and Smooth Policies

Sapana Chaudhary, Balaraman Ravindran

PDF

TL;DR

This paper introduces SPaCIL, a new imitation learning algorithm that enforces smoothness in policies and cost functions, leading to better performance, faster learning, and smoother control in continuous environments.

Contribution

The paper proposes a novel regularization approach for both policy and cost models in adversarial imitation learning to promote smoothness in high-dimensional control tasks.

Findings

01

SPaCIL outperforms existing IL algorithms on smoothness metrics.

02

SPaCIL achieves faster learning and higher average returns.

03

The method effectively produces smoother policies in continuous control environments.

Abstract

Imitation learning (IL) is a popular approach in the continuous control setting as among other reasons it circumvents the problems of reward mis-specification and exploration in reinforcement learning (RL). In IL from demonstrations, an important challenge is to obtain agent policies that are smooth with respect to the inputs. Learning through imitation a policy that is smooth as a function of a large state-action ( $s$ - $a$ ) space (typical of high dimensional continuous control environments) can be challenging. We take a first step towards tackling this issue by using smoothness inducing regularizers on \textit{both} the policy and the cost models of adversarial imitation learning. Our regularizers work by ensuring that the cost function changes in a controlled manner as a function of $s$ - $a$ space; and the agent policy is well behaved with respect to the state space. We call our new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.