Braxlines: Fast and Interactive Toolkit for RL-driven Behavior   Engineering beyond Reward Maximization

Shixiang Shane Gu; Manfred Diaz; Daniel C. Freeman; Hiroki Furuta,; Seyed Kamyar Seyed Ghasemipour; Anton Raichuk; Byron David; Erik Frey; Erwin; Coumans; Olivier Bachem

arXiv:2110.04686·cs.LG·October 12, 2021·1 cites

Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization

Shixiang Shane Gu, Manfred Diaz, Daniel C. Freeman, Hiroki Furuta,, Seyed Kamyar Seyed Ghasemipour, Anton Raichuk, Byron David, Erik Frey, Erwin, Coumans, Olivier Bachem

PDF

Open Access

TL;DR

Braxlines is a fast, interactive toolkit that enables RL-driven behavior generation beyond reward maximization, supporting unsupervised skill learning and environment creation with minimal training time.

Contribution

It introduces Braxlines, a toolkit with a programmatic API and stable baselines for behavior synthesis beyond reward maximization, facilitating rapid environment and behavior development.

Findings

01

Supports unsupervised skill learning and distribution sketching.

02

Enables behavior synthesis within minutes of training.

03

Provides standardized metrics for evaluating non-reward-based algorithms.

Abstract

The goal of continuous control is to synthesize desired behaviors. In reinforcement learning (RL)-driven approaches, this is often accomplished through careful task reward engineering for efficient exploration and running an off-the-shelf RL algorithm. While reward maximization is at the core of RL, reward engineering is not the only -- sometimes nor the easiest -- way for specifying complex behaviors. In this paper, we introduce \braxlines, a toolkit for fast and interactive RL-driven behavior generation beyond simple reward maximization that includes Composer, a programmatic API for generating continuous control environments, and set of stable and well-tested baselines for two families of algorithms -- mutual information maximization (MiMax) and divergence minimization (DMin) -- supporting unsupervised skill learning and distribution sketching as other modes of behavior specification.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics