Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert
Yoonhyung Lee, Sungdong Lee, and Joong-Ho Won

TL;DR
This paper analyzes implicit stochastic gradient descent (ISGD) methods, deriving error bounds and online covariance estimators for statistical inference without relying on generalized linear model assumptions.
Contribution
It provides the first online covariance estimators for ISGD, enabling valid confidence intervals for model parameters in a broad setting.
Findings
Derived non-asymptotic error bounds for proxRM and proxPR
Proposed online estimators for asymptotic covariance matrices
Constructed valid confidence intervals for model parameters
Abstract
The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive non-asymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Stochastic Gradient Optimization Techniques · Statistical Methods and Bayesian Inference
MethodsStochastic Gradient Descent
