Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Jiajun Fan; Yuzheng Zhuang; Yuecheng Liu; Jianye Hao; Bin Wang; Jiangcheng Zhu; Hao Wang; Shu-Tao Xia

arXiv:2305.05239·cs.LG·October 28, 2025·1 cites

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Jiajun Fan, Yuzheng Zhuang, Yuecheng Liu, Jianye Hao, Bin Wang, Jiangcheng Zhu, Hao Wang, Shu-Tao Xia

PDF

Open Access 1 Video

TL;DR

This paper introduces Learnable Behavioral Control (LBC), a framework that significantly enlarges the behavior selection space in deep RL, enabling agents to surpass human records in Atari games efficiently.

Contribution

It proposes a hybrid behavior mapping and a learnable process for behavior selection, improving diversity and performance in RL agents.

Findings

01

Achieved 10077.52% mean human normalized score

02

Surpassed 24 human world records in Atari games

03

Maintained sample efficiency while improving performance

Abstract

The exploration problem is one of the main challenges in deep reinforcement learning (RL). Recent promising works tried to handle the problem with population-based methods, which collect samples with diverse behaviors derived from a population of different exploratory policies. Adaptive policy selection has been adopted for behavior control. However, the behavior selection space is largely limited by the predefined policy population, which further limits behavior diversity. In this paper, we propose a general framework called Learnable Behavioral Control (LBC) to address the limitation, which a) enables a significantly enlarged behavior selection space via formulating a hybrid behavior mapping from all policies; b) constructs a unified learnable process for behavior selection. We introduce LBC into distributed off-policy actor-critic methods and achieve behavior control via optimizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Machine Learning and Data Classification