Using Deep Q-Learning to Control Optimization Hyperparameters
Samantha Hansen

TL;DR
This paper introduces a deep Q-learning approach to adaptively control hyperparameters during optimization, demonstrating improved convergence over traditional line search methods in neural network training.
Contribution
The paper develops a novel reinforcement learning framework for hyperparameter control, specifically using deep Q-networks to learn effective learning rate adjustment policies.
Findings
DQNs learn policies similar to line search methods
Q-gradient descent outperforms traditional gradient descent
Convergence of q-values indicates effective learning
Abstract
We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimization hyperparameter. Using Q-learning with experience replay, we train two DQNs to accept a state representation of an objective function as input and output the expected discounted return of rewards, or q-values, connected to the actions of either adjusting the learning rate or leaving it unchanged. The two DQNs learn a policy similar to a line search, but differ in the number of allowed actions. The trained DQNs in combination with a gradient-based update routine form the basis of the Q-gradient descent algorithms. To demonstrate the viability of this framework, we show that the DQN's q-values associated with optimal action converge and that the Q-gradient descent algorithms outperform gradient descent with an Armijo or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural Networks and Applications
MethodsQ-Learning
