On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+($\lambda$,$\lambda$))-GA

Tai Nguyen; Phong Le; Andr\'e Biedenkapp; Carola Doerr; Nguyen Dang

arXiv:2502.20265·cs.LG·July 10, 2025

On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+($\lambda$,$\lambda$))-GA

Tai Nguyen, Phong Le, Andr\'e Biedenkapp, Carola Doerr, Nguyen Dang

PDF

1 Repo

TL;DR

This paper emphasizes the critical role of reward design in reinforcement learning for dynamic algorithm configuration, demonstrating how reward shaping improves exploration, scalability, and learning effectiveness in optimizing the OneMax problem with a specific genetic algorithm.

Contribution

The study highlights the importance of reward shaping in RL-based DAC and introduces a reward mechanism that enhances exploration and scalability in optimizing genetic algorithms.

Findings

01

Reward shaping improves RL exploration in DAC.

02

Poor reward design causes learning divergence.

03

Reward shaping enhances scalability across problem sizes.

Abstract

Dynamic Algorithm Configuration (DAC) has garnered significant attention in recent years, particularly in the prevalence of machine learning and deep learning algorithms. Numerous studies have leveraged the robustness of decision-making in Reinforcement Learning (RL) to address the optimization challenges associated with algorithm configuration. However, making an RL agent work properly is a non-trivial task, especially in reward design, which necessitates a substantial amount of handcrafted knowledge based on domain expertise. In this work, we study the importance of reward design in the context of DAC via a case study on controlling the population size of the $(1 + (λ, λ))$ -GA optimizing OneMax. We observed that a poorly designed reward can hinder the RL agent's ability to learn an optimal policy because of a lack of exploration, leading to both scalability and learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taindp98/OneMax-DAC
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Dynamic Algorithm Configuration