Constructing an Optimal Behavior Basis for the Option Keyboard

Lucas N. Alegre; Ana L. C. Bazzan; Andr\'e Barreto; Bruno C. da Silva

arXiv:2505.00787·cs.LG·November 14, 2025

Constructing an Optimal Behavior Basis for the Option Keyboard

Lucas N. Alegre, Ana L. C. Bazzan, Andr\'e Barreto, Bruno C. da Silva

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel method to construct an optimal behavior basis for the Option Keyboard in multi-task reinforcement learning, enabling zero-shot optimal solutions for linear and certain non-linear tasks with fewer base policies.

Contribution

It provides an efficient way to build an optimal behavior basis that outperforms existing coverage sets and scales to complex domains.

Findings

01

Reduces the number of base policies needed for optimality.

02

Enables solving certain non-linear tasks optimally.

03

Outperforms state-of-the-art approaches in complex domains.

Abstract

Multi-task reinforcement learning aims to quickly identify solutions for new tasks with minimal or no additional interaction with the environment. Generalized Policy Improvement (GPI) addresses this by combining a set of base policies to produce a new one that is at least as good -- though not necessarily optimal -- as any individual base policy. Optimality can be ensured, particularly in the linear-reward case, via techniques that compute a Convex Coverage Set (CCS). However, these are computationally expensive and do not scale to complex domains. The Option Keyboard (OK) improves upon GPI by producing policies that are at least as good -- and often better. It achieves this through a learned meta-policy that dynamically combines base policies. However, its performance critically depends on the choice of base policies. This raises a key question: is there an optimal set of base policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Constructing an Optimal Behavior Basis for the Option Keyboard· slideslive

Taxonomy

TopicsICT Impact and Policies · Digital Platforms and Economics · Economic theories and models

MethodsBalanced Selection · Sparse Evolutionary Training