Mini-batch Estimation for Deep Cox Models: Statistical Foundations and Practical Guidance
Lang Zeng, Weijing Tang, Zhao Ren, Ying Ding

TL;DR
This paper develops statistical theory and practical guidance for mini-batch estimation in deep Cox models using SGD, highlighting the importance of batch size and learning rate for convergence and efficiency.
Contribution
It introduces the mini-batch maximum partial-likelihood estimator (mb-MPLE), establishing its statistical properties and providing practical SGD tuning strategies for Cox models.
Findings
mb-MPLE is consistent and near minimax optimal for Cox-NN.
For Cox regression, mb-MPLE is $\
The ratio of learning rate to batch size critically affects SGD dynamics and convergence.
Abstract
The stochastic gradient descent (SGD) algorithm has been widely used to optimize deep Cox neural network (Cox-NN) by updating model parameters using mini-batches of data. We show that SGD aims to optimize the average of mini-batch partial-likelihood, which is different from the standard partial-likelihood. This distinction requires developing new statistical properties for the global optimizer, namely, the mini-batch maximum partial-likelihood estimator (mb-MPLE). We establish that mb-MPLE for Cox-NN is consistent and achieves the optimal minimax convergence rate up to a polylogarithmic factor. For Cox regression with linear covariate effects, we further show that mb-MPLE is -consistent and asymptotically normal with asymptotic variance approaching the information lower bound as batch size increases, which is confirmed by simulation studies. Additionally, we offer practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
