K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents

Vincenzo De Paola; Mirco Mutti; Riccardo Zamboni; Marcello Restelli

arXiv:2601.18580·cs.LG·January 27, 2026

K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents

Vincenzo De Paola, Mirco Mutti, Riccardo Zamboni, Marcello Restelli

PDF

Open Access

TL;DR

K-Myriad introduces a scalable unsupervised approach that leverages diverse exploration strategies among parallel agents to enhance reinforcement learning efficiency and solution diversity.

Contribution

It presents K-Myriad, a novel method that maximizes collective state entropy to cultivate diverse exploration strategies in parallel reinforcement learning.

Findings

01

Enables learning of a broad set of distinct policies.

02

Improves training efficiency in high-dimensional tasks.

03

Facilitates discovery of heterogeneous solutions.

Abstract

Parallelization in Reinforcement Learning is typically employed to speed up the training of a single policy, where multiple workers collect experience from an identical sampling distribution. This common design limits the potential of parallelization by neglecting the advantages of diverse exploration strategies. We propose K-Myriad, a scalable and unsupervised method that maximizes the collective state entropy induced by a population of parallel policies. By cultivating a portfolio of specialized exploration strategies, K-Myriad provides a robust initialization for Reinforcement Learning, leading to both higher training efficiency and the discovery of heterogeneous solutions. Experiments on high-dimensional continuous control tasks, with large-scale parallelization, demonstrate that K-Myriad can learn a broad set of distinct policies, highlighting its effectiveness for collective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques