Utilizing Novelty-based Evolution Strategies to Train Transformers in Reinforcement Learning

Maty\'a\v{s} Lorenc; Roman Neruda

arXiv:2502.06301·cs.LG·September 18, 2025

Utilizing Novelty-based Evolution Strategies to Train Transformers in Reinforcement Learning

Maty\'a\v{s} Lorenc, Roman Neruda

PDF

Open Access

TL;DR

This paper explores novelty-based evolution strategies for training transformer models in reinforcement learning, assessing their effectiveness and potential acceleration methods, with mixed experimental results.

Contribution

It introduces and evaluates novelty-based variants of OpenAI-ES for training large transformer models in reinforcement learning.

Findings

01

NS-ES showed some progress but needs more iterations.

02

NSR-ES performed well on larger models, comparable to previous methods.

03

Seeding training with pretrained models can potentially accelerate learning.

Abstract

In this paper, we experiment with novelty-based variants of OpenAI-ES, the NS-ES and NSR-ES algorithms, and evaluate their effectiveness in training complex, transformer-based architectures designed for the problem of reinforcement learning, such as Decision Transformers. We also test if we can accelerate the novelty-based training of these larger models by seeding the training with a pretrained models. The experimental results were mixed. NS-ES showed progress, but it would clearly need many more iterations for it to yield interesting agents. NSR-ES, on the other hand, proved quite capable of being straightforwardly used on larger models, since its performance appears as similar between the feed-forward model and Decision Transformer, as it was for the OpenAI-ES in our previous work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Absolute Position Encodings · Residual Connection · Adam · Layer Normalization · Label Smoothing · Position-Wise Feed-Forward Layer