# A Gold Standard-Derived Modular Barcoding Approach to Cancer Transcriptomics

**Authors:** Yan Zhu, Mohamad Karim I. Koleilat, Jason Roszik, Man Kam Kwong, Zhonglin Wang, Dipen M. Maru, Scott Kopetz, Lawrence N. Kwong

PMC · DOI: 10.3390/cancers16101886 · Cancers · 2024-05-15

## TL;DR

This paper introduces modular barcoding, a user-friendly approach to cancer transcriptomics that uses high-quality datasets to create visual gene modules for easier analysis and discovery.

## Contribution

The novel contribution is the development of modular barcoding, which uses gold standard datasets to generate accessible and adaptable gene expression modules for cancer analysis.

## Key findings

- Modular barcoding uncovers novel gene relationships and improves analysis of lower-resolution datasets.
- The method can recreate and expand known cancer subtyping schemes and bridge disparate gene signatures.
- It allows efficient application of single-cell RNA sequencing data and hypothesis generation in spreadsheets.

## Abstract

Many resources exist to analyze cancer RNA data, but many of the algorithms and programs can appear as black boxes to non-bioinformaticians. To make RNA data more accessible, we here present modular barcoding, an approach predicated on the idea that cancer type-specific modules derived from high-quality, “gold standard” datasets will also be of high quality. Key to the use of these modules is their direct visualization, which can be done in spreadsheet programs in a color-coded way, essentially creating interactive heatmaps and visual gene set enrichments. We illustrate a variety of uses, including cancer subtype analyses, novel gene–gene and gene–clinical relationships, the inference of novel gene functions, and single-cell RNAseq analysis. Finally, we provide the tools for users to create their own modules, which will further improve their quality over time as single-cell RNAseq resolution advances. Modular barcoding is a user-friendly, tractable, yet powerful approach to make novel transcriptomic discoveries.

A challenge with studying cancer transcriptomes is in distilling the wealth of information down into manageable portions of information. In this resource, we develop an approach that creates and assembles cancer type-specific gene expression modules into flexible barcodes, allowing for adaptation to a wide variety of uses. Specifically, we propose that modules derived organically from high-quality gold standards such as The Cancer Genome Atlas (TCGA) can accurately capture and describe functionally related genes that are relevant to specific cancer types. We show that such modules can: (1) uncover novel gene relationships and nominate new functional memberships, (2) improve and speed up analysis of smaller or lower-resolution datasets, (3) re-create and expand known cancer subtyping schemes, (4) act as a “decoder” to bridge seemingly disparate established gene signatures, and (5) efficiently apply single-cell RNA sequencing information to other datasets. Moreover, such modules can be used in conjunction with native spreadsheet program commands to create a powerful and rapid approach to hypothesis generation and testing that is readily accessible to non-bioinformaticians. Finally, we provide tools for users to create and interpret their own modules. Overall, the flexible modular nature of the proposed barcoding provides a user-friendly approach to rapidly decoding transcriptome-wide data for research or, potentially, clinical uses.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Diseases:** Cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11120226/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC11120226/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC11120226/full.md

---
Source: https://tomesphere.com/paper/PMC11120226