Cross-Entropy Optimization for Hyperparameter Optimization in Stochastic   Gradient-based Approaches to Train Deep Neural Networks

Kevin Li; Fulu Li

arXiv:2409.09240·cs.LG·September 17, 2024

Cross-Entropy Optimization for Hyperparameter Optimization in Stochastic Gradient-based Approaches to Train Deep Neural Networks

Kevin Li, Fulu Li

PDF

Open Access

TL;DR

This paper introduces a cross-entropy optimization method for tuning hyperparameters in stochastic gradient-based training of deep neural networks, aiming to improve convergence and generalization.

Contribution

It presents a novel CEHPO algorithm that applies cross-entropy optimization to hyperparameter tuning within the EM framework for deep learning.

Findings

01

Effective hyperparameter tuning improves training performance.

02

The method is adaptable to various optimization problems in deep learning.

03

Provides insights into hyperparameter dynamics during training.

Abstract

In this paper, we present a cross-entropy optimization method for hyperparameter optimization in stochastic gradient-based approaches to train deep neural networks. The value of a hyperparameter of a learning algorithm often has great impact on the performance of a model such as the convergence speed, the generalization performance metrics, etc. While in some cases the hyperparameters of a learning algorithm can be part of learning parameters, in other scenarios the hyperparameters of a stochastic optimization algorithm such as Adam [5] and its variants are either fixed as a constant or are kept changing in a monotonic way over time. We give an in-depth analysis of the presented method in the framework of expectation maximization (EM). The presented algorithm of cross-entropy optimization for hyperparameter optimization of a learning algorithm (CEHPO) can be equally applicable to other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications

MethodsAdam