Using Deep Q-Learning to Control Optimization Hyperparameters

Samantha Hansen

arXiv:1602.04062·math.OC·June 21, 2016·23 cites

Using Deep Q-Learning to Control Optimization Hyperparameters

Samantha Hansen

PDF

Open Access

TL;DR

This paper introduces a deep Q-learning approach to adaptively control hyperparameters during optimization, demonstrating improved convergence over traditional line search methods in neural network training.

Contribution

The paper develops a novel reinforcement learning framework for hyperparameter control, specifically using deep Q-networks to learn effective learning rate adjustment policies.

Findings

01

DQNs learn policies similar to line search methods

02

Q-gradient descent outperforms traditional gradient descent

03

Convergence of q-values indicates effective learning

Abstract

We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimization hyperparameter. Using Q-learning with experience replay, we train two DQNs to accept a state representation of an objective function as input and output the expected discounted return of rewards, or q-values, connected to the actions of either adjusting the learning rate or leaving it unchanged. The two DQNs learn a policy similar to a line search, but differ in the number of allowed actions. The trained DQNs in combination with a gradient-based update routine form the basis of the Q-gradient descent algorithms. To demonstrate the viability of this framework, we show that the DQN's q-values associated with optimal action converge and that the Q-gradient descent algorithms outperform gradient descent with an Armijo or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural Networks and Applications

MethodsQ-Learning