Utilizing Novelty-based Evolution Strategies to Train Transformers in Reinforcement Learning
Maty\'a\v{s} Lorenc, Roman Neruda

TL;DR
This paper explores novelty-based evolution strategies for training transformer models in reinforcement learning, assessing their effectiveness and potential acceleration methods, with mixed experimental results.
Contribution
It introduces and evaluates novelty-based variants of OpenAI-ES for training large transformer models in reinforcement learning.
Findings
NS-ES showed some progress but needs more iterations.
NSR-ES performed well on larger models, comparable to previous methods.
Seeding training with pretrained models can potentially accelerate learning.
Abstract
In this paper, we experiment with novelty-based variants of OpenAI-ES, the NS-ES and NSR-ES algorithms, and evaluate their effectiveness in training complex, transformer-based architectures designed for the problem of reinforcement learning, such as Decision Transformers. We also test if we can accelerate the novelty-based training of these larger models by seeding the training with a pretrained models. The experimental results were mixed. NS-ES showed progress, but it would clearly need many more iterations for it to yield interesting agents. NSR-ES, on the other hand, proved quite capable of being straightforwardly used on larger models, since its performance appears as similar between the feed-forward model and Decision Transformer, as it was for the OpenAI-ES in our previous work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Absolute Position Encodings · Residual Connection · Adam · Layer Normalization · Label Smoothing · Position-Wise Feed-Forward Layer
