Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($\lambda$,$\lambda$))-GA
Tai Nguyen, Phong Le, Andr\'e Biedenkapp, Carola Doerr, Nguyen Dang

TL;DR
This study evaluates deep reinforcement learning algorithms for dynamic algorithm configuration on a simple optimization problem, identifying key challenges and proposing solutions to improve learning stability and exploration.
Contribution
It introduces an adaptive reward shifting mechanism for DDQN, demonstrating improved sample efficiency and stability in controlling the $(1+(mbda,mbda))$-GA on OneMax.
Findings
DDQN with adaptive reward shifting matches theoretical policies' performance
PPO faces variance issues and requires hyperparameter tuning for stability
Standard deep-RL approaches struggle with scalability and exploration in DAC
Abstract
Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies leverage Reinforcement Learning (RL) to address DAC challenges; however, applying RL often requires extensive domain expertise. In this work, we conduct a comprehensive study of two deep-RL algorithms--Double Deep Q-Networks (DDQN) and Proximal Policy Optimization (PPO)--for controlling the population size of the -GA on OneMax instances. Although OneMax is structurally simple, learning effective control policies for the -GA induces a highly challenging DAC landscape, making it a controlled yet demanding benchmark. Our investigation reveals two fundamental challenges limiting DDQN and PPO: scalability degradation and learning instability, traced to under-exploration and planning horizon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
