Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
Atsushi Nitanda, Taiji Suzuki

TL;DR
This paper demonstrates that stochastic gradient descent and its averaging variant can achieve exponential convergence rates for expected classification errors in binary classification, under certain conditions, improving upon traditional sublinear rates.
Contribution
It extends exponential convergence results to a broader class of loss functions beyond squared loss, applicable to binary classification with theoretical guarantees.
Findings
Exponential convergence of expected classification error shown for a wide class of loss functions.
Averaged stochastic gradient descent achieves exponential convergence from early training phase.
Experimental verification on L2-regularized logistic regression supports theoretical results.
Abstract
We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems
