Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($\lambda$,$\lambda$))-GA

Tai Nguyen; Phong Le; Andr\'e Biedenkapp; Carola Doerr; Nguyen Dang

arXiv:2512.03805·cs.LG·April 3, 2026

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($\lambda$,$\lambda$))-GA

Tai Nguyen, Phong Le, Andr\'e Biedenkapp, Carola Doerr, Nguyen Dang

PDF

TL;DR

This study evaluates deep reinforcement learning algorithms for dynamic algorithm configuration on a simple optimization problem, identifying key challenges and proposing solutions to improve learning stability and exploration.

Contribution

It introduces an adaptive reward shifting mechanism for DDQN, demonstrating improved sample efficiency and stability in controlling the $(1+(mbda,mbda))$-GA on OneMax.

Findings

01

DDQN with adaptive reward shifting matches theoretical policies' performance

02

PPO faces variance issues and requires hyperparameter tuning for stability

03

Standard deep-RL approaches struggle with scalability and exploration in DAC

Abstract

Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies leverage Reinforcement Learning (RL) to address DAC challenges; however, applying RL often requires extensive domain expertise. In this work, we conduct a comprehensive study of two deep-RL algorithms--Double Deep Q-Networks (DDQN) and Proximal Policy Optimization (PPO)--for controlling the population size of the $(1 + (λ, λ))$ -GA on OneMax instances. Although OneMax is structurally simple, learning effective control policies for the $(1 + (λ, λ))$ -GA induces a highly challenging DAC landscape, making it a controlled yet demanding benchmark. Our investigation reveals two fundamental challenges limiting DDQN and PPO: scalability degradation and learning instability, traced to under-exploration and planning horizon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.