Research on key indicators for diagnosis and prediction of rheumatoid arthritis based on GBDT+LR embedded feature selection model
Jiaqi Chen, Qiang Zhang, Zhenqiang Huang, Chunsheng Qu

TL;DR
This paper introduces a new model combining GBDT and LR to identify key clinical indicators for diagnosing rheumatoid arthritis more accurately and efficiently.
Contribution
A novel embedded feature selection framework using GBDT+LR and SHAP to identify critical biomarkers for RA diagnosis.
Findings
The model achieved high diagnostic accuracy and specificity for rheumatoid arthritis.
SHAP analysis revealed systemic metabolic indicators as important diagnostic markers.
The framework outperformed conventional methods in test accuracy and AUC.
Abstract
Rheumatoid arthritis (RA) exhibits substantial diagnostic overlap with other autoimmune diseases that share similar pathological features, leading to redundant testing and limited diagnostic specificity. Therefore, there is an urgent need to identify critical clinical indicators with high diagnostic and predictive value to improve both diagnostic efficiency and accuracy. To address this challenge, we propose a multidimensional embedded feature selection framework based on ensemble learning. This framework integrates Gradient Boosted Decision Trees (GBDT) and Logistic Regression (LR) models to extract potential diagnostic features from multi-source clinical datasets. GBDT captures complex nonlinear interactions among features, enhancing adaptability to heterogeneous data, while LR leverages its sparsity-promoting characteristics to perform dimensionality reduction and highlight…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRheumatoid Arthritis Research and Therapies · Imbalanced Data Classification Techniques · Artificial Intelligence in Healthcare
