Boosting methods for interval-censored data with regression and classification

Yuan Bian; Grace Y. Yi; Wenqing He

arXiv:2601.17973·stat.ML·February 19, 2026

Boosting methods for interval-censored data with regression and classification

Yuan Bian, Grace Y. Yi, Wenqing He

PDF

Open Access 3 Reviews

TL;DR

This paper introduces novel nonparametric boosting methods tailored for regression and classification with interval-censored data, addressing a common challenge in survival analysis and related fields.

Contribution

The work develops scalable boosting algorithms using censoring unbiased transformations, with theoretical guarantees and improved performance on interval-censored data.

Findings

01

Robust performance demonstrated in finite-sample scenarios

02

Theoretical properties including optimality established

03

Methods effectively handle complex censoring structures

Abstract

Boosting has garnered significant interest across both machine learning and statistical communities. Traditional boosting algorithms, designed for fully observed random samples, often struggle with real-world problems, particularly with interval-censored data. This type of data is common in survival analysis and time-to-event studies where exact event times are unobserved but fall within known intervals. Effective handling of such data is crucial in fields like medical research, reliability engineering, and social sciences. In this work, we introduce novel nonparametric boosting methods for regression and classification tasks with interval-censored data. Our approaches leverage censoring unbiased transformations to adjust loss functions and impute transformed responses while maintaining model accuracy. Implemented via functional gradient descent, these methods ensure scalability and…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- The paper is mathematically well written. The notation is consistent and precise. Definitions and proofs are provided in the Appendix. - The theoretical results are novel, clearly structured and formulated. The implications of each theoretical result are well discussed and framed within the literature context if relevant. - The proposed algorithms are experimentally tested on the synthetic dataset under various scenarios. The method is further applied to real-life dataset.

Weaknesses

1) The underlying part of the model is estimating survival function. It would be interesting to include the experiment with different survival function estimators to assess how sensitive the method is to the potential biases of the estimator of the survival times as this part of the model is not properly covered by presented theory. 2) It is not clear how sensitive the proposed method is to the noise in the underlying data. 3) Authors did not provide much insights into how well the algorithm s

Reviewer 02Rating 5Confidence 3

Strengths

The article is clearly written and provides a thorough theoretical analysis of the proposed algorithm, which is really important to ground further applicative work using this method in practice.

Weaknesses

There are several limitations of this work. - my main concern is that the paper does not clarify the empirical improvement compared to the use of ICRF. It is slightly different indeed to use boosted trees compared to random forests (bagging), but the article fails to show a clear gain in performance, both on simulated and real data - there should be more baselines, like Cox models, even if they are designed for right-censored data rather than interval data. There are very few baselines included

Reviewer 03Rating 6Confidence 4

Strengths

- The manuscript is clearly written and accessible, making the methodology easy to follow. - The authors conduct a rigorous theoretical analysis of their proposed methods, assessing mean squared error (MSE), variance, and bias to establish a solid foundation for their approach.

Weaknesses

- In Proposition 2, using a distinct notation for the smoother matrix would help prevent confusion with the survival function $S$. - Adding experiments for the Cox proportional hazards model would strengthen the manuscript's applicability and support its conclusions. - Additional ensemble methods exist for interval-censored data, such as: - Yao, W., Frydman, H., and Simonoff, J.S., 2021. An ensemble method for interval-censored time-to-event data. Biostatistics, 22(1), pp.198-213. Includi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification