Learning Calibratable Policies using Programmatic Style-Consistency

Eric Zhan; Albert Tseng; Yisong Yue; Adith Swaminathan; Matthew; Hausknecht

arXiv:1910.01179·cs.LG·July 17, 2020·1 cites

Learning Calibratable Policies using Programmatic Style-Consistency

Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew, Hausknecht

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a method for controllable long-term behavior generation using programmatic style labels, enabling policies to faithfully reproduce a large number of style combinations in complex environments.

Contribution

We propose a novel style-consistency learning framework that leverages programmatic labels to control multiple behavior styles simultaneously.

Findings

01

Our approach achieves calibration for up to 1024 style combinations.

02

Existing methods fail to generate diverse behaviors without explicit style enforcement.

03

The framework is validated on basketball and MuJoCo environments.

Abstract

We study the problem of controllable generation of long-term sequential behaviors, where the goal is to calibrate to multiple behavior styles simultaneously. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are two questions that pose significant challenges when generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated behavior faithfully demonstrates combinatorially many styles? We leverage programmatic labeling functions to specify controllable styles, and derive a formal notion of style-consistency as a learning objective, which can then be solved using conventional policy learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that existing approaches that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Learning Calibratable Policies using Programmatic Style-Consistency· slideslive

Taxonomy

TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Reinforcement Learning in Robotics