# Leveraging Subjective Parameters and Biomarkers in Machine Learning Models: The Feasibility of lnc-IL7R for Managing Emphysema Progression

**Authors:** Tzu-Tao Chen, Tzu-Yu Cheng, I-Jung Liu, Shu-Chuan Ho, Kang-Yun Lee, Huei-Tyng Huang, Po-Hao Feng, Kuan-Yuan Chen, Ching-Shan Luo, Chien-Hua Tseng, Yueh-His Chen, Arnab Majumdar, Cheng-Yu Tsai, Sheng-Ming Wu

PMC · DOI: 10.3390/diagnostics15091165 · 2025-05-03

## TL;DR

This study explores using a new biomarker, lnc-IL7R, with machine learning to better classify emphysema, a form of COPD, using accessible clinical data.

## Contribution

The study introduces lnc-IL7R as a novel biomarker for emphysema classification in machine learning models.

## Key findings

- lnc-IL7R fold changes were strongly and negatively associated with emphysema severity (LAA% ≥15%).
- The random forest model achieved over 75% accuracy and AUROC in emphysema classification.
- lnc-IL7R was identified as the strongest predictor for emphysema classification, followed by CAT scores and BMI.

## Abstract

Background/Objectives: Chronic obstructive pulmonary disease (COPD) remains a leading cause of death worldwide, with emphysema progression providing valuable insights into disease development. Clinical assessment approaches, including pulmonary function tests and high-resolution computed tomography, are limited by accessibility constraints and radiation exposure. This study, therefore, proposed an alternative approach by integrating the novel biomarker long non-coding interleukin-7 receptor α-subunit gene (lnc-Il7R), along with other easily accessible clinical and biochemical metrics, into machine learning (ML) models. Methods: This cohort study collected baseline characteristics, COPD Assessment Test (CAT) scores, and biochemical details from the enrolled participants. Associations with emphysema severity, defined by a low attenuation area percentage (LAA%) threshold of 15%, were evaluated using simple and multivariate-adjusted models. The dataset was then split into training and validation (80%) and test (20%) subsets. Five ML models were employed, with the best-performing model being further analyzed for feature importance. Results: The majority of participants were elderly males. Compared to the LAA% <15% group, the LAA% ≥15% group demonstrated a significantly higher body mass index (BMI), poor pulmonary function, and lower expression levels of lnc-Il7R (all p < 0.01). Fold changes in lnc-IL7R were strongly and negatively associated with LAA% (p < 0.01). The random forest (RF) model achieved the highest accuracy and area under the receiver operating characteristic curve (AUROC) across datasets. A feature importance analysis identified lnc-IL7R fold changes as the strongest predictor for emphysema classification (LAA% ≥15%), followed by CAT scores and BMI. Conclusions: Machine learning models incorporated accessible clinical and biochemical markers, particularly the novel biomarker lnc-IL7R, achieving classification accuracy and AUROC exceeding 75% in emphysema assessments. These findings offer promising opportunities for improving emphysema classification and COPD management.

## Linked entities

- **Genes:** IL7R (interleukin 7 receptor) [NCBI Gene 3575]
- **Diseases:** COPD (MONDO:0005002), emphysema (MONDO:0004849)

## Full-text entities

- **Diseases:** COPD (MESH:D029424), death (MESH:D003643), Emphysema (MESH:D004646)

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12071574/full.md

---
Source: https://tomesphere.com/paper/PMC12071574