On the existence of the maximum likelihood estimate and convergence rate under gradient descent for multi-class logistic regression
Dwight Nwaigwe, Marek Rychlik

TL;DR
This paper investigates conditions for the existence of maximum likelihood estimates in multi-class logistic regression without requiring data separability and provides a convergence rate estimate for gradient descent based on the Hessian's condition number.
Contribution
It introduces a method to ensure MLE existence by assigning positive probability to all classes, independent of data separability, and offers a general convergence rate estimate for gradient descent.
Findings
MLE exists if all classes are assigned positive probability.
Convergence rate depends on the Hessian's condition number.
Operator-theoretic framework simplifies analysis.
Abstract
We revisit the problem of the existence of the maximum likelihood estimate for multi-class logistic regression. We show that one method of ensuring its existence is by assigning positive probability to every class in the sample dataset. The notion of data separability is not needed, which is in contrast to the classical set up of multi-class logistic regression in which each data sample belongs to one class. We also provide a general and constructive estimate of the convergence rate to the maximum likelihood estimate when gradient descent is used as the optimizer. Our estimate involves bounding the condition number of the Hessian of the maximum likelihood function. The approaches used in this article rely on a simple operator-theoretic framework.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Machine Learning and Algorithms
