Scalable likelihood-based estimation and variable selection for the Cox model with incomplete covariates
Ngok Sang Kwok, Kin Yau Wong

TL;DR
This paper introduces a scalable EM algorithm with a transformation technique for likelihood-based Cox regression with incomplete covariates, enabling efficient estimation and variable selection in high-dimensional missing data scenarios.
Contribution
It develops a novel EM algorithm with a one-dimensional integration in the E-step and extends it with LASSO for variable selection, improving scalability and applicability.
Findings
Algorithm is computationally efficient for high-dimensional missing data
Method outperforms existing approaches in simulations
Successfully applied to cancer genomic data
Abstract
Regression analysis with missing data is a long-standing and challenging problem, particularly when there are many missing variables with arbitrary missing patterns. Likelihood-based methods, although theoretically appealing, are often computationally inefficient or even infeasible when dealing with a large number of missing variables. In this paper, we consider the Cox regression model with incomplete covariates that are missing at random. We develop an expectation-maximization (EM) algorithm for nonparametric maximum likelihood estimation, employing a transformation technique in the E-step so that it involves only a one-dimensional integration. This innovation makes our methods scalable with respect to the dimension of the missing variables. In addition, for variable selection, we extend the proposed EM algorithm to accommodate a LASSO penalty in the likelihood. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Distribution Estimation and Applications · Bayesian Methods and Mixture Models
