Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression
Yinan Lin, Wen Zhou, Zhi Geng, Gexin Xiao, and Jianxin Yin

TL;DR
This paper introduces a novel high-dimensional logistic regression model that estimates unknown threshold points and coefficients simultaneously, improving disease prediction accuracy with a fusion penalization approach.
Contribution
The paper proposes the FILTER model, combining threshold estimation and variable selection via fused lasso penalty, with theoretical guarantees and practical applications in disease prediction.
Findings
The FILTER model achieves consistent variable selection and threshold estimation.
Monte Carlo studies validate the theoretical error bounds.
Application to diabetes data demonstrates practical effectiveness.
Abstract
In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors. Here, we consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses. Both the threshold points and regression coefficients are unknown and to be estimated. For high dimensional data, we propose a fusion penalized logistic threshold regression (FILTER) model, where a fused lasso penalty is employed to control the total variation and shrink the coefficients to zero as a method of variable selection. Under mild conditions on the estimate of unknown threshold points, we establish the non-asymptotic error bound for coefficient estimation and the model selection consistency. With a careful characterization of the error propagation, we have also shown that the tree-based method, such as CART,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference
MethodsLogistic Regression
