Accelerated Neural Network Training with Rooted Logistic Objectives

Zhu Wang; Praveen Raj Veluswami; Harsh Mishra; Sathya N. Ravi

arXiv:2310.03890·cs.LG·October 9, 2023

Accelerated Neural Network Training with Rooted Logistic Objectives

Zhu Wang, Praveen Raj Veluswami, Harsh Mishra, Sathya N. Ravi

PDF

Open Access 4 Reviews

TL;DR

This paper introduces a novel rooted logistic loss function that enhances neural network training by ensuring faster convergence and improved performance across various models and applications.

Contribution

The paper derives a new strictly convex loss function based on logistic landscape design, extending its application to deep models and generative tasks.

Findings

01

Faster convergence in training deep neural networks.

02

Performance improvements on classification benchmarks.

03

Effective application in generative model fine-tuning.

Abstract

Many neural networks deployed in the real world scenarios are trained using cross entropy based loss functions. From the optimization perspective, it is known that the behavior of first order methods such as gradient descent crucially depend on the separability of datasets. In fact, even in the most simplest case of binary classification, the rate of convergence depends on two factors: (1) condition number of data matrix, and (2) separability of the dataset. With no further pre-processing techniques such as over-parametrization, data augmentation etc., separability is an intrinsic quantity of the data distribution under consideration. We focus on the landscape design of the logistic function and derive a novel sequence of {\em strictly} convex functions that are at least as strict as logistic loss. The minimizers of these functions coincide with those of the minimum norm solution…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

The RLO objective seems novel to me. Advertised to be a better alternative to logistic loss, the proposed RLO can potentially be widely applied to various supervised tasks. In this work, the authors not only evaluated standard image classification but also extended to training the discriminator in GAN models. I like the inclusion of a toy data case to provide more visual clues.

Weaknesses

## Weak theoretical analysis First, the scope of this work is on substituting the logistic loss, or more precisely, approximating the log function with polynomials. However, in classification, the logistic loss is only one of the many surrogate losses for the more fundamental 0-1 loss. The authors did not discuss any other surrogate loss functions and how they relate to the 0-1 loss. [1] is a related work. Second, the theoretical statements are hand-wavy. For instance, the "better conditionin

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

The presentation of this paper is clear.

Weaknesses

1. RLO lacks the mathematical derivation by replacing the log with $1/k$ in the cross-entropy loss. In fact, the cross-entropy is to minimize the -log P, where $P$ is the likelihood. As the data are i.i.d, the likelihood of all the data can be written as the multiplication of the likelihood of each sample, e.g., $$\log P(y_1,y_2,...,y_n|x_1,x_2,...,x_n)= \log \prod P(y_1|x_1)...P(y_n|x_n) = \sum_{i=1}^n P(y_i|x_i).$$ In this paper, it replaces log with $(\cdot)^{1/k}$ and still sums them togethe

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. The paper is overall well-organized. 2. The proposed method is interesting and novel to the best of the reviewer's knowledge.

Weaknesses

1. The experiments conducted in the paper are based on datasets that are too simple and the corresponding baseline test accuracy is not reasonable (<90% test accuracy for Cifar-10). The reviewer would appreciate if the results of more realistic datasets could be included. 2. The reviewer didn't check the full derivation in the Appendix, but the derivation shown in the paper seems a bit sloppy (see Questions), hence hindering the soundness of the paper a bit.

Reviewer 04Rating 3· reject, not good enoughConfidence 4

Strengths

1. This paper proposes a new rooted loss objective. 2. This paper is sound. 3. This paper provides a lot of experiments on different datasets of supervised and unsupervised learning.

Weaknesses

1. The contribution of this paper is limited. The new objective loss function is based on the approximation of the natural logarithm function. If using the proposed loss objective, we introduce a new tuning parameter $k$. It may take a lot of time to tune this new parameter, but the improvement of the performance is not significant enough, as shown in Table 3. 2. Some parts of the paper are not explained clearly. For example, this paper mentions that the reason of generalization bounds for logi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Advanced Neural Network Applications

MethodsDense Connections · HuMan(Expedia)||How do I get a human at Expedia? · Adaptive Instance Normalization · R1 Regularization · Feedforward Network · Convolution · Focus · StyleGAN