Simultaneous Factors Selection and Fusion of Their Levels in Penalized Logistic Regression
Lea Kaufmann, Maria Kateri

TL;DR
This paper introduces a new regularization method called $L_{0}$-Fused Group Lasso for binary logistic regression, effectively reducing model complexity by selecting variables and fusing levels of categorical covariates, with strong theoretical and empirical support.
Contribution
The paper proposes the $L_{0}$-Fused Group Lasso method, combining variable selection and level fusion for categorical predictors, with theoretical guarantees and efficient algorithms.
Findings
$L_{0}$-FGL achieves high accuracy in variable selection.
Method performs well in high-dimensional settings.
Theoretical properties include consistency and oracle properties.
Abstract
Nowadays, several data analysis problems require for complexity reduction, mainly meaning that they target at removing the non-influential covariates from the model and at delivering a sparse model. When categorical covariates are present, with their levels being dummy coded, the number of parameters included in the model grows rapidly, fact that emphasizes the need for reducing the number of parameters to be estimated. In this case, beyond variable selection, sparsity is also achieved through fusion of levels of covariates which do not differentiate significantly in terms of their influence on the response variable. In this work a new regularization technique is introduced, called -Fused Group Lasso (-FGL) for binary logistic regression. It uses a group lasso penalty for factor selection and for the fusion part it applies an penalty on the differences among the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Advanced Statistical Process Monitoring
