# Identification of Key Genes in Diabetic Retinopathy Using Bioinformatics and Machine Learning Approaches

**Authors:** Xiaoyan Zhu, Lulu Lian

PMC · DOI: 10.7759/cureus.99936 · Cureus · 2025-12-23

## TL;DR

This study identifies COL6A2 as a key gene in diabetic retinopathy, using bioinformatics and machine learning to explore its role in disease mechanisms and early diagnosis.

## Contribution

The study introduces COL6A2 as a novel biomarker for diabetic retinopathy with high diagnostic accuracy and potential therapeutic relevance.

## Key findings

- COL6A2 is significantly upregulated in diabetic retinopathy and shows high diagnostic accuracy with AUC values of 1.00 and 0.89 in training and validation sets.
- COL6A2 is associated with extracellular matrix organization, cell adhesion, angiogenesis, and inflammatory responses in diabetic retinopathy.
- hsa-miR-762 and hsa-miR-29a-3p may regulate COL6A2 through competitive binding with lncRNAs like PABPC1L2B-AS1 and RP11-223P11.3.

## Abstract

Objective

This study aimed to identify key genes associated with diabetic retinopathy (DR) by applying bioinformatics and machine learning techniques to publicly available transcriptomic datasets. We further evaluated their diagnostic performance and explored their potential biological functions and upstream regulatory mechanisms, providing a theoretical basis for the early diagnosis and molecular-targeted therapy of DR.

Methods

DR-related transcriptomic datasets GSE94019 and GSE60436 were obtained from the Gene Expression Omnibus (GEO) database, with GSE94019 serving as the training set and GSE60436 as the validation set. The data were then subjected to normalization and differential expression analysis. Feature genes were selected using the Least Absolute Shrinkage and Selection Operator (LASSO) regression and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) algorithms. Overlapping genes were identified as key candidates. Diagnostic performance was evaluated by plotting receiver operating characteristic (ROC) curves using the R package pROC. Functional enrichment analysis, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses, was performed on differentially expressed genes (DEGs) associated with the key gene. Potential upstream miRNAs and lncRNAs were predicted using the miRanda, miRDB, TargetScan, and spongeScan databases, and a lncRNA-miRNA-mRNA regulatory network was constructed.

Results

A total of 790 DEGs were identified, including 370 upregulated and 419 downregulated genes. Cross-validation using LASSO and SVM-RFE identified Collagen Type VI Alpha 2 Chain (COL6A2) and LINC01247 as key genes. COL6A2 was significantly upregulated in the DR group. ROC analysis revealed high diagnostic accuracy, with area under the curve (AUC) values of 1.00 (training set) and 0.89 (validation set). In contrast, LINC01247 was significantly downregulated, but its AUC values were 1.00 (training set) and 0.52 (validation set), indicating limited diagnostic value; thus, it was excluded from further analysis. Functional enrichment centered on COL6A2 suggested that its associated DEGs were involved in aberrant extracellular matrix (ECM) organization, cell adhesion, angiogenesis, and inflammatory responses. Moreover, regulatory network analysis indicated that hsa-miR-762 and hsa-miR-29a-3p may indirectly regulate COL6A2 expression by competitively binding multiple lncRNAs (e.g., PABPC1L2B-AS1 and RP11-223P11.3), forming a potential ceRNA regulatory axis.

Conclusion

This study identifies COL6A2 as a key gene in DR, characterized by significant upregulation in DR tissues and close involvement in ECM remodeling, cell adhesion, and angiogenesis. These findings provide novel molecular targets and theoretical insights for elucidating the molecular mechanisms of DR and for improving early diagnostic strategies.

## Linked entities

- **Genes:** COL6A2 (collagen type VI alpha 2 chain) [NCBI Gene 1292], LINC01247 (long intergenic non-protein coding RNA 1247) [NCBI Gene 101929390]
- **Diseases:** diabetic retinopathy (MONDO:0005266)

## Full-text entities

- **Genes:** PABPC1L2B-AS1 (PABPC1L2B antisense RNA 1) [NCBI Gene 101928345], LINC01247 (long intergenic non-protein coding RNA 1247) [NCBI Gene 101929390], COL6A2 (collagen type VI alpha 2 chain) [NCBI Gene 1292] {aka BTHLM1, BTHLM1B, PP3610, UCMD1, UCMD1B}, MIR762 (microRNA 762) [NCBI Gene 100313837] {aka hsa-mir-762}
- **Diseases:** inflammatory (MESH:D007249), DR (MESH:D003930)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12826082/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12826082/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/PMC12826082/full.md

---
Source: https://tomesphere.com/paper/PMC12826082