# Transcriptome Complexity Disentangled: A Regulatory Molecules Approach

**Authors:** Amir Asiaee, Zachary B. Abrams, Heather H. Pua, Kevin R. Coombes

PMC · DOI: 10.3390/ijms26062510 · International Journal of Molecular Sciences · 2025-03-11

## TL;DR

This study shows that a small set of regulatory molecules can predict gene expression patterns across different cancer types, suggesting a simpler underlying structure of the transcriptome.

## Contribution

The novel contribution is the identification of a small set of regulatory molecules that can predict genome-wide gene expression with high accuracy.

## Key findings

- A subset of 28 miRNA and 28 TF clusters can differentiate tissue origins with 92.8% accuracy.
- The Tissue-Aware model achieved an R2 of 0.70 in predicting gene expression using only 56 regulatory molecules.
- The transcriptome's low-dimensional structure suggests that regulatory molecules capture most of its variability.

## Abstract

Transcription factors (TFs) and microRNAs (miRNAs) are fundamental regulators of gene expression, cell state, and biological processes. This study investigated whether a small subset of TFs and miRNAs could accurately predict genome-wide gene expression. We analyzed 8895 samples across 31 cancer types from The Cancer Genome Atlas and identified 28 miRNA and 28 TF clusters using unsupervised learning. Medoids of these clusters could differentiate tissues of origin with 92.8% accuracy, demonstrating their biological relevance. We developed Tissue-Agnostic and Tissue-Aware models to predict 20,000 gene expressions using the 56 selected medoid miRNAs and TFs. The Tissue-Aware model attained an R2 of 0.70 by incorporating tissue-specific information. Despite measuring only 1/400th of the transcriptome, the prediction accuracy was comparable to that achieved by the 1000 landmark genes. This suggests the transcriptome has an intrinsically low-dimensional structure that can be captured by a few regulatory molecules. Our approach could enable cheaper transcriptome assays and analysis of low-quality samples. It also provides insights into genes that are heavily regulated by miRNAs/TFs versus alternative mechanisms. However, model transportability was impacted by dataset discrepancies, especially in miRNA distribution. Overall, this study demonstrates the potential of a biology-guided approach for robust transcriptome representation.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Genes:** F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}
- **Diseases:** Cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11942001/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11942001/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/PMC11942001/full.md

---
Source: https://tomesphere.com/paper/PMC11942001