Combining Automated Optimisation of Hyperparameters and Reward Shape

Julian Dierkes; Emma Cramer; Holger H. Hoos; Sebastian Trimpe

arXiv:2406.18293·cs.LG·October 10, 2024

Combining Automated Optimisation of Hyperparameters and Reward Shape

Julian Dierkes, Emma Cramer, Holger H. Hoos, Sebastian Trimpe

PDF

1 Repo

TL;DR

This paper presents a methodology for jointly optimizing hyperparameters and reward functions in deep reinforcement learning, demonstrating improved performance and stability across multiple environments.

Contribution

It introduces a combined optimization approach for hyperparameters and reward functions, highlighting their mutual dependence and enhancing RL performance.

Findings

01

Combined optimization outperforms baseline in half of the environments.

02

Including a variance penalty improves policy stability.

03

The approach achieves competitive results with minimal additional computational cost.

Abstract

There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies on these design choices. Also, most RL research is conducted on known benchmarks where knowledge about these choices already exists. However, novel practical applications often pose complex tasks for which no prior knowledge about good hyperparameters and reward functions is available, thus necessitating their derivation from scratch. Prior work has examined automatically tuning either hyperparameters or reward functions individually. We demonstrate empirically that an RL algorithm's hyperparameter configurations and reward function are often mutually dependent, meaning neither can be fully optimised without appropriate values for the other. We then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ada-research/combined_hpo_and_reward_shaping
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.