Simultaneous Estimation and Model Choice for Big Discrete Time-to-Event Data with Additive Predictors

Benjamin M\"uller; Nikolaus Umlauf; Johannes Seiler; Kenneth Harttgen; Stefan Lang

arXiv:2507.08099·stat.ME·July 14, 2025

Simultaneous Estimation and Model Choice for Big Discrete Time-to-Event Data with Additive Predictors

Benjamin M\"uller, Nikolaus Umlauf, Johannes Seiler, Kenneth Harttgen, Stefan Lang

PDF

TL;DR

This paper extends a scalable algorithm to estimate and select variables in discrete-time hazard models, enabling efficient analysis of large, complex survival datasets with high-dimensional covariates.

Contribution

It introduces an extension of the Batchwise Backfitting algorithm for discrete hazard models, improving scalability, accuracy, and variable selection in high-dimensional settings.

Findings

01

Accurate estimates in simulated data

02

Automatic variable selection

03

Efficient scaling to large datasets

Abstract

Discrete-time hazard models are widely used when event times are measured in intervals or are not precisely observed. While these models can be estimated using standard generalized linear model techniques, they rely on extensive data augmentation, making estimation computationally demanding in high-dimensional settings. In this paper, we demonstrate how the recently proposed Batchwise Backfitting algorithm, a general framework for scalable estimation and variable selection in distributional regression, can be effectively extended to discrete hazard models. Using both simulated data and a large-scale application on infant mortality in sub-Saharan Africa, we show that the algorithm delivers accurate estimates, automatically selects relevant predictors, and scales efficiently to large data sets. The findings underscore the algorithm's practical utility for analysing large-scale, complex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.