Simultaneous Estimation and Model Choice for Big Discrete Time-to-Event Data with Additive Predictors
Benjamin M\"uller, Nikolaus Umlauf, Johannes Seiler, Kenneth Harttgen, Stefan Lang

TL;DR
This paper extends a scalable algorithm to estimate and select variables in discrete-time hazard models, enabling efficient analysis of large, complex survival datasets with high-dimensional covariates.
Contribution
It introduces an extension of the Batchwise Backfitting algorithm for discrete hazard models, improving scalability, accuracy, and variable selection in high-dimensional settings.
Findings
Accurate estimates in simulated data
Automatic variable selection
Efficient scaling to large datasets
Abstract
Discrete-time hazard models are widely used when event times are measured in intervals or are not precisely observed. While these models can be estimated using standard generalized linear model techniques, they rely on extensive data augmentation, making estimation computationally demanding in high-dimensional settings. In this paper, we demonstrate how the recently proposed Batchwise Backfitting algorithm, a general framework for scalable estimation and variable selection in distributional regression, can be effectively extended to discrete hazard models. Using both simulated data and a large-scale application on infant mortality in sub-Saharan Africa, we show that the algorithm delivers accurate estimates, automatically selects relevant predictors, and scales efficiently to large data sets. The findings underscore the algorithm's practical utility for analysing large-scale, complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
