Learning Calibratable Policies using Programmatic Style-Consistency
Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew, Hausknecht

TL;DR
This paper introduces a method for controllable long-term behavior generation using programmatic style labels, enabling policies to faithfully reproduce a large number of style combinations in complex environments.
Contribution
We propose a novel style-consistency learning framework that leverages programmatic labels to control multiple behavior styles simultaneously.
Findings
Our approach achieves calibration for up to 1024 style combinations.
Existing methods fail to generate diverse behaviors without explicit style enforcement.
The framework is validated on basketball and MuJoCo environments.
Abstract
We study the problem of controllable generation of long-term sequential behaviors, where the goal is to calibrate to multiple behavior styles simultaneously. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are two questions that pose significant challenges when generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated behavior faithfully demonstrates combinatorially many styles? We leverage programmatic labeling functions to specify controllable styles, and derive a formal notion of style-consistency as a learning objective, which can then be solved using conventional policy learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that existing approaches that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Reinforcement Learning in Robotics
