# Discovery of Novel NMR-Based Biomarkers and Interpretable Machine Learning Models for Risk Prediction of Rheumatoid Arthritis

**Authors:** Hong Lin, Rui Wang, Linyan Lu, Ping Tian, Xiaodi Yang, Lianbo Xiao, Qing-Hua Li, Guo-Qiang Lin

PMC · DOI: 10.3390/metabo16030153 · Metabolites · 2026-02-25

## TL;DR

This study discovers new biomarkers and machine learning models to predict rheumatoid arthritis risk and disease activity using NMR-based serum analysis.

## Contribution

The study introduces novel NMR-based biomarkers and interpretable machine learning models for RA risk prediction.

## Key findings

- Formic acid and H4PL were identified as significant RA-associated biomarkers.
- The Random Forest model showed strong discriminatory ability in the test set.
- A DAS-28 prediction model explained 54.8% of variance in the cohort.

## Abstract

Background: Early diagnosis of rheumatoid arthritis (RA) remains challenging due to the limited performance of existing serum biomarkers. This exploratory study aimed to identify novel serum metabolite and lipoprotein biomarkers for RA and to develop interpretable machine learning models for screening. Methods: This study employed 1H-NMR metabolomics to analyze serum from 77 RA patients and 70 healthy controls, quantifying 38 endogenous metabolites and 112 lipoprotein parameters. Seven key biomarkers were identified using multiple criteria and Least Absolute Shrinkage and Selection Operator (LASSO) regression. The dataset was split into training and testing sets (7:3 ratio), and four machine learning models were constructed. The Random Forest (RF) model was further interpreted using the SHapley Additive exPlanations (SHAP) method. Results: The selected biomarkers, including formic acid and High-density lipoprotein 4 phospholipids (H4PL), showed significant associations with RA. In the internal test set, the RF model demonstrated promising discriminatory ability. Additionally, a proof-of-concept regression model for predicting the Disease Activity Score in 28 joints (DAS-28) score was developed, explaining a portion of its variance (R2 = 0.548) in this cohort. Conclusions: This exploratory, single-center study identifies a novel panel of potential biomarkers for RA and provides a preliminary, interpretable predictive tool. The findings, particularly the internally validated high performance of certain markers, are hypothesis-generating and underscore the need for validation in larger, multi-center cohorts. The DAS-28 prediction model also warrants further investigation.

## Linked entities

- **Chemicals:** formic acid (PubChem CID 284)
- **Diseases:** rheumatoid arthritis (MONDO:0008383), RA (MONDO:0005272)

## Full-text entities

- **Genes:** VIP (vasoactive intestinal peptide) [NCBI Gene 7432] {aka PHM27}, APOA2 (apolipoprotein A2) [NCBI Gene 336] {aka APOA2D, Apo-AII, ApoA-II, apoAII}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, AGBL2 (AGBL carboxypeptidase 2) [NCBI Gene 79841] {aka CCP2}, APOB (apolipoprotein B) [NCBI Gene 338] {aka FCHL2, FLDB, LDLCQ4, apoB-100, apoB-48}, PLA2G1B (phospholipase A2 group IB) [NCBI Gene 5319] {aka PLA2, PLA2A, PPLA2}, APOA1 (apolipoprotein A1) [NCBI Gene 335] {aka AMYLD3, HPALP2, apo(a)}, IL1B (interleukin 1 beta) [NCBI Gene 3553] {aka IL-1, IL1-BETA, IL1F2, IL1beta}, TNF (tumor necrosis factor) [NCBI Gene 7124] {aka DIF, IMD127, TNF-alpha, TNFA, TNFSF2, TNLG1F}, NLRP3 (NLR family pyrin domain containing 3) [NCBI Gene 114548] {aka AGTAVPRL, AII, AVP, C1orf7, CIAS1, CLR1.1}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}, TBP (TATA-box binding protein) [NCBI Gene 6908] {aka GTF2D, GTF2D1, HDL4, SCA17, TBP1, TFIID}
- **Diseases:** atherosclerosis (MESH:D050197), mitochondrial dysfunction (MESH:D028361), cachexia (MESH:D002100), RA (MESH:D001172), gut dysbiosis (MESH:D064806), Sjogren's syndrome (MESH:D012859), weakness (MESH:D018908), cardiovascular complications (MESH:D002318), polyarthritis (MESH:D001168), muscle wasting (MESH:D009133), dyslipidemia (MESH:D050171), infections (MESH:D007239), injury to (MESH:D014947), synovitis (MESH:D013585), chronic inflammation (MESH:D007249), H4PL (MESH:D013631), hypoxic (MESH:D002534), fatigue (MESH:D005221), joint damage (MESH:D007592), hypoxia (MESH:D000860), rheumatic diseases (MESH:D012216), malignancy (MESH:D009369), muscle loss (MESH:D009135), autoimmune disease (MESH:D001327), systemic lupus erythematosus (MESH:D008180), metabolic abnormality (MESH:D008659), joint destruction (MESH:D008105)
- **Chemicals:** TCA (MESH:D014238), triglycerides (MESH:D014280), lipid (MESH:D008055), sarcosine (MESH:D012521), Acetic acid (MESH:D019342), Ethanol (MESH:D000431), methionine (MESH:D008715), FC (MESH:C095424), leukotrienes (MESH:D015289), glutamine (MESH:D005973), Citric acid (MESH:D019343), TG (MESH:D013866), prostaglandins (MESH:D011453), 3-Hydroxybutyric acid (MESH:D020155), cholesterol (MESH:D002784), 1H (-), SCFA (MESH:D005232), Formic acid (MESH:C030544), glucose (MESH:D005947), carbohydrate (MESH:D002241), arachidonic acid (MESH:D016718), Phospholipids (MESH:D010743), amino acid (MESH:D000596), Creatinine (MESH:D003404), phosphatidylcholine (MESH:D010713)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13028240/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13028240/full.md

## References

56 references — full list in the complete paper: https://tomesphere.com/paper/PMC13028240/full.md

---
Source: https://tomesphere.com/paper/PMC13028240