A Unified Scheme of ResNet and Softmax
Zhao Song, Weixin Wang, Junze Yin

TL;DR
This paper introduces a unified theoretical framework combining softmax regression and ResNet through a novel regression problem, analyzing its properties and implications for neural network optimization.
Contribution
It provides the first unified scheme connecting softmax regression and ResNet, with detailed analysis of the loss landscape and optimization methods.
Findings
Hessian is positive semidefinite with a low-rank plus diagonal structure.
Derived gradient, Hessian, and Lipschitz properties of the loss function.
Enables efficient approximate Newton optimization methods.
Abstract
Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to image classification, object detection, semantic segmentation, and tensors. Previous research works studied these two concepts separately. In this paper, we provide a theoretical analysis of the regression problem: , where is a matrix in , is a vector in , and is the -dimensional vector whose entries are all . This regression problem is a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Topic Modeling
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · Batch Normalization · Kaiming Initialization · Residual Connection · Residual Block · Global Average Pooling · 1x1 Convolution · Bottleneck Residual Block · Max Pooling
