Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition
Bowen Cheng, Rong Xiao, Yandong Guo, Yuxiao Hu, Jianfeng Wang, Lei, Zhang

TL;DR
This paper proposes a data-dependent initialization method for multinomial logistic regression in deep neural networks, improving training speed and accuracy in image recognition tasks by using a closed-form approximate solution called RGC.
Contribution
It introduces a regularized Gaussian classifier (RGC) for better initialization of logistic regression layers, enhancing convergence and performance over random initialization.
Findings
Reduces training time by up to 10 times in image classification.
Achieves a 3.2% accuracy gain in Flickr-style classification.
Improves training efficiency and accuracy in object detection tasks.
Abstract
We study in this paper how to initialize the parameters of multinomial logistic regression (a fully connected layer followed with softmax and cross entropy loss), which is widely used in deep neural network (DNN) models for classification problems. As logistic regression is widely known not having a closed-form solution, it is usually randomly initialized, leading to several deficiencies especially in transfer learning where all the layers except for the last task-specific layer are initialized using a pre-trained model. The deficiencies include slow convergence speed, possibility of stuck in local minimum, and the risk of over-fitting. To address those deficiencies, we first study the properties of logistic regression and propose a closed-form approximate solution named regularized Gaussian classifier (RGC). Then we adopt this approximate solution to initialize the task-specific linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Logistic Regression · Linear Layer · Softmax
