Boosting methods for interval-censored data with regression and classification
Yuan Bian, Grace Y. Yi, Wenqing He

TL;DR
This paper introduces novel nonparametric boosting methods tailored for regression and classification with interval-censored data, addressing a common challenge in survival analysis and related fields.
Contribution
The work develops scalable boosting algorithms using censoring unbiased transformations, with theoretical guarantees and improved performance on interval-censored data.
Findings
Robust performance demonstrated in finite-sample scenarios
Theoretical properties including optimality established
Methods effectively handle complex censoring structures
Abstract
Boosting has garnered significant interest across both machine learning and statistical communities. Traditional boosting algorithms, designed for fully observed random samples, often struggle with real-world problems, particularly with interval-censored data. This type of data is common in survival analysis and time-to-event studies where exact event times are unobserved but fall within known intervals. Effective handling of such data is crucial in fields like medical research, reliability engineering, and social sciences. In this work, we introduce novel nonparametric boosting methods for regression and classification tasks with interval-censored data. Our approaches leverage censoring unbiased transformations to adjust loss functions and impute transformed responses while maintaining model accuracy. Implemented via functional gradient descent, these methods ensure scalability and…
Peer Reviews
Decision·ICLR 2025 Poster
- The paper is mathematically well written. The notation is consistent and precise. Definitions and proofs are provided in the Appendix. - The theoretical results are novel, clearly structured and formulated. The implications of each theoretical result are well discussed and framed within the literature context if relevant. - The proposed algorithms are experimentally tested on the synthetic dataset under various scenarios. The method is further applied to real-life dataset.
1) The underlying part of the model is estimating survival function. It would be interesting to include the experiment with different survival function estimators to assess how sensitive the method is to the potential biases of the estimator of the survival times as this part of the model is not properly covered by presented theory. 2) It is not clear how sensitive the proposed method is to the noise in the underlying data. 3) Authors did not provide much insights into how well the algorithm s
The article is clearly written and provides a thorough theoretical analysis of the proposed algorithm, which is really important to ground further applicative work using this method in practice.
There are several limitations of this work. - my main concern is that the paper does not clarify the empirical improvement compared to the use of ICRF. It is slightly different indeed to use boosted trees compared to random forests (bagging), but the article fails to show a clear gain in performance, both on simulated and real data - there should be more baselines, like Cox models, even if they are designed for right-censored data rather than interval data. There are very few baselines included
- The manuscript is clearly written and accessible, making the methodology easy to follow. - The authors conduct a rigorous theoretical analysis of their proposed methods, assessing mean squared error (MSE), variance, and bias to establish a solid foundation for their approach.
- In Proposition 2, using a distinct notation for the smoother matrix would help prevent confusion with the survival function $S$. - Adding experiments for the Cox proportional hazards model would strengthen the manuscript's applicability and support its conclusions. - Additional ensemble methods exist for interval-censored data, such as: - Yao, W., Frydman, H., and Simonoff, J.S., 2021. An ensemble method for interval-censored time-to-event data. Biostatistics, 22(1), pp.198-213. Includi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification
