# Identification of key factors for early detection of rheumatoid arthritis in primary care using machine learning

**Authors:** Fatemeh Rahimi, Elham Rajaei, Noushin Movafagh, Ali Mohammad Hadianfard

PMC · DOI: 10.1038/s41598-025-34158-1 · 2026-01-12

## TL;DR

This study uses machine learning to identify key factors for early detection of rheumatoid arthritis in primary care, aiming to reduce delays in specialist referral.

## Contribution

The study introduces a machine learning approach to identify critical early indicators of rheumatoid arthritis in primary care settings.

## Key findings

- The CatBoost model achieved high performance with AUC-ROC of 0.966, accuracy of 0.947, and F1-score of 0.951.
- Key factors identified include Anti-CCP, tender joint count, and swollen joint count as the most significant for early RA detection.
- Fatigue, age, and positive RF were also found to significantly increase the likelihood of rheumatoid arthritis.

## Abstract

Rheumatoid arthritis (RA) is a chronic disease that causes irreversible joint damage. Early detection, especially in primary care settings, is crucial for effective disease management. This study aimed to identify the factors that help screen individuals at risk of RA to reduce delays in referral to rheumatologists. This analytical and applied research used a questionnaire to gather data from 377 patients at a rheumatology diagnostic center in Ahvaz, Iran, between August and November 2024. Study variables included patients’ articular and extra-articular symptoms at disease onset, demographic data, and initial laboratory markers. After performing statistical correlation analysis, the dataset was split into training (80%) and testing (20%) subsets. Five machine learning models were developed, and the SHAP method was applied to the best-performing model to identify influential features. The results were obtained via 5-fold nested cross-validation, which identified the CatBoost model as the top performer, with AUC-ROC = 0.966, Accuracy = 0.947, and F1-Score = 0.951. SHAP (with a threshold of 0.01) highlighted the following significant features: Anti-CCP, tender joint count, swollen joint count, gastrointestinal issues, fatigue, age, RF (Rheumatoid Factor), and hearing problems. Due to the importance of early RA diagnosis and the challenges encountered in primary care, three main screening factors stand out: Anti-CCP, tender joint count, and swollen joint count. These, along with fatigue, age, and positive RF, markedly increase the likelihood of RA and justify referring a patient to a specialist.

## Linked entities

- **Chemicals:** RF (PubChem CID 150964)
- **Diseases:** rheumatoid arthritis (MONDO:0008383)

## Full-text entities

- **Genes:** CA8 (carbonic anhydrase 8 (inactive)) [NCBI Gene 767] {aka CA-RP, CA-VIII, CALS, CAMRQ3, CARP, SCAR34}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** auditory symptoms (MESH:D006311), RF (MESH:D001171), autoimmune disease (MESH:D001327), gastric atrophy (MESH:D001284), auditory problems (MESH:D019973), hearing impairment (MESH:D034381), RA (MESH:D001172), Fatigue (MESH:D005221), malignancies (MESH:D009369), ulcers (MESH:D014456), joint damage (MESH:D007592), premature death (MESH:D003643), swelling (MESH:D004487), functional disability (MESH:D003291), immune dysfunction (MESH:D007154), morning stiffness (MESH:D048968), gastrointestinal complications (MESH:D005767), bacterial or viral infections (MESH:D014777), intestinal metaplasia (MESH:D007410), vasculitis (MESH:D014657), inflammation (MESH:D007249), rheumatologic (MESH:D012216), crystalline arthropathies (MESH:D000070657), gastric mucosal (MESH:D013272), pain (MESH:D010146)
- **Species:** Nicotiana tabacum (American tobacco, species) [taxon 4097], Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12855854/full.md

---
Source: https://tomesphere.com/paper/PMC12855854