# Stability-Driven Osteoporosis Screening: Multi-View Consensus Feature Selection with External Validation and Sensitivity Analysis

**Authors:** Waragunt Waratamrongpatai, Watcharaporn Cholamjiak, Nontawat Eiamniran, Phatcharapon Udomluck

PMC · DOI: 10.3390/jcm15020677 · 2026-01-14

## TL;DR

This study evaluates how stable and effective simplified models are for predicting osteoporosis risk using established factors like age and medication, finding that they perform well compared to more complex models.

## Contribution

The study introduces a stability-driven approach to feature selection in osteoporosis screening, validating simplified models across datasets.

## Key findings

- Age and corticosteroid use are dominant predictors of osteoporosis risk across datasets.
- Simplified models using age and medication variables achieved high accuracy and AUC comparable to full-feature models.
- Naïve Bayes and linear classifiers showed the most stable performance under external validation.

## Abstract

Background/Objectives: Osteoporosis is a major global health concern, and early risk assessment plays a crucial role in fracture prevention. Although demographic, clinical, and lifestyle factors are commonly incorporated into screening tools, their relative importance within data-driven prediction frameworks can vary substantially across datasets. Rather than aiming to identify novel predictors, this study evaluates the stability and behavior of established osteoporosis risk factors using statistical inference and machine learning-based feature selection methods across heterogeneous data sources. We further examine whether simplified and near-minimal models can achieve predictive performances comparable to that of full-feature configurations. Methods: An open-access Kaggle dataset (n = 1958) and a retrospective clinical dataset from the University of Phayao Hospital (n = 176) were analyzed. Feature relevance was assessed using logistic regression, likelihood ratio testing, MRMR, ReliefF, and unified importance scoring. Multiple predictor configurations, ranging from full-feature to minimal and near-minimal models, were evaluated using decision tree, support vector machine, k-nearest neighbor, naïve Bayes, and efficient linear classifiers. External validation was performed using hospital-based records. Results: Across all analyses, age consistently emerged as the dominant predictor, followed by corticosteroid use, while other variables showed limited incremental predictive contributions. Simplified models based on age alone or age combined with medication-related variables achieved performances comparable to full-feature models (accuracy ≈91% and AUC ≈ 0.95). In addition, near-minimal models incorporating gender alongside age and medications demonstrated a favorable balance between discrimination and computational efficiency under external validation. Although overall performance declined under distributional shift, naïve Bayes and efficient linear classifiers showed the most stable external behavior (AUC = 0.728–0.787). Conclusions: These findings indicate that stability-driven feature selection primarily reproduces well-established epidemiological risk patterns rather than identifying novel predictors. Minimal and near-minimal models—including those incorporating gender—retain acceptable performances under external validation and are methodologically efficient. Given the limited size and single-center nature of the external cohort, the results should be interpreted as preliminary methodological evidence rather than definitive support for clinical screening deployment. Further multi-center studies are required to assess generalizability and clinical relevance.

## Linked entities

- **Diseases:** osteoporosis (MONDO:0005298)

## Full-text entities

- **Diseases:** Osteoporosis (MESH:D010024), fracture (MESH:D050723)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12842224/full.md

---
Source: https://tomesphere.com/paper/PMC12842224