Simultaneous Training of First- and Second-Order Optimizers in   Population-Based Reinforcement Learning

Felix Pfeiffer; Shahram Eivazi

arXiv:2408.15421·cs.LG·September 5, 2024

Simultaneous Training of First- and Second-Order Optimizers in Population-Based Reinforcement Learning

Felix Pfeiffer, Shahram Eivazi

PDF

Open Access

TL;DR

This paper introduces a novel approach to population-based reinforcement learning by simultaneously training first- and second-order optimizers, leading to improved performance and stability across various environments.

Contribution

It is the first to empirically demonstrate the benefits of integrating second-order optimizers like K-FAC into population-based RL training.

Findings

01

Up to 10% performance improvement with combined optimizers.

02

Enhanced training stability in challenging environments.

03

Reliable learning outcomes with mixed optimizer populations.

Abstract

The tuning of hyperparameters in reinforcement learning (RL) is critical, as these parameters significantly impact an agent's performance and learning efficiency. Dynamic adjustment of hyperparameters during the training process can significantly enhance both the performance and stability of learning. Population-based training (PBT) provides a method to achieve this by continuously tuning hyperparameters throughout the training. This ongoing adjustment enables models to adapt to different learning stages, resulting in faster convergence and overall improved performance. In this paper, we propose an enhancement to PBT by simultaneously utilizing both first- and second-order optimizers within a single population. We conducted a series of experiments using the TD3 algorithm across various MuJoCo environments. Our results, for the first time, empirically demonstrate the potential of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Metaheuristic Optimization Algorithms Research · Reinforcement Learning in Robotics

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Target Policy Smoothing · Clipped Double Q-learning · Experience Replay · Adam · Twin Delayed Deep Deterministic