A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization
Yan Wang, Xuelei Sherry Ni

TL;DR
This study develops an XGBoost-based risk classification model using feature selection and Bayesian hyper-parameter optimization, demonstrating superior performance over logistic regression in business risk prediction.
Contribution
It introduces the combined use of multiple feature selection methods and Bayesian TPE hyper-parameter tuning to enhance XGBoost model performance for business risk classification.
Findings
XGBoost with TPE outperforms logistic regression significantly.
Chi-square feature selection yields the best XGBoost performance.
Bayesian TPE tuning results in more stable and accurate models.
Abstract
This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structured Parzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank Test. The performance of XGBoost is compared to the traditionally utilized logistic regression (LR) model in terms of classification accuracy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Machine Learning and Data Classification
