# A Global Assessment of the Transcription-Dependent Single Nucleotide Variants Relies on the Characteristics of RNA-Sequencing Technologies

**Authors:** Xia Zhang, Jiawei Liu, Yabing Zhu, Guixue Hou, Mingzhou Bai, Yuxin Li, Wenbo Cui, Siqi Liu

PMC · DOI: 10.3390/biom16020211 · Biomolecules · 2026-01-29

## TL;DR

This study introduces TSCS, a new machine learning tool that improves detection of RNA-level genetic variations in cancer cells using RNA sequencing data.

## Contribution

TSCS combines short- and long-read RNA-seq data to more accurately classify transcript SNVs, revealing new insights into RNA editing in cancer.

## Key findings

- TSCS detected 31.83% more transcript SNVs on average compared to existing tools.
- Approximately 40% of transcript SNVs in cancer cell lines were attributed to RNA editing (e-tSNVs).
- New patterns in e-tSNVs were identified that do not fit known RNA editing categories.

## Abstract

Single nucleotide variants (SNVs) are crucial in cancer occurrence and development. SNVs at the transcriptomic level generally come from genomic variants (g-tSNVs) and RNA editing (e-tSNVs). The types and quantities of e-tSNVs remain a subject of debate due to a relatively poor understanding of RNA editing processes. Herein, we developed TSCS (Transcript SNVs Classifier relying on complementary sequencings), a machine learning classifier that integrates short-read (MGI) and long-read (PacBio) RNA-seq data to accurately distinguish true transcript SNVs using stringent criteria. Applied to five colorectal cancer cell lines (HCT15, LoVo, SW480, SW620, and HCT116), TSCS demonstrated superior accuracy and sensitivity, outperforming established tools (GATK, BCFtools, Longshot, RED_ML). It increased the total detected transcript SNVs by 31.83% on average, with g-tSNVs and e-tSNVs exceeding conventional methods by >1-fold and >2-fold, respectively. TSCS achieved mean recall rates of 75.3% for g-tSNVs and 77.2% for e-tSNVs. Notably, for the first time, e-tSNVs were found in a relatively large proportion of total transcript SNVs in cancer cell lines, approximately 40%. Of the identified e-tSNVs, 80% were attributed to the known RNA editing, but the other e-tSNVs did not fall into any known category. Importantly, the e-tSNVs uniquely detected in this study showed distinct patterns in SNV types and genomic locations. Additionally, the transcript SNVs called by TSCS were partially confirmed using experimental approaches, such as Sanger sequencing, RNC-seq, and mass spectrometry. This study lays the foundation for surveying and appraising the cancer-related e-tSNVs.

## Linked entities

- **Diseases:** cancer (MONDO:0004992), colorectal cancer (MONDO:0005575)

## Full-text entities

- **Genes:** H3P16 (H3 histone pseudogene 16) [NCBI Gene 644914] {aka H3.6, H3F3AP6, p21}, SKP2 (S-phase kinase associated protein 2) [NCBI Gene 6502] {aka FBL1, FBXL1, FLB1, p45}, CDC14B (cell division cycle 14B) [NCBI Gene 8555] {aka CDC14B3, Cdc14B1, Cdc14B2, hCDC14B}, GAL (galanin and GMAP prepropeptide) [NCBI Gene 51083] {aka ETL8, GAL-GMAP, GALN, GLNN, GMAP}, AZIN1 (antizyme inhibitor 1) [NCBI Gene 51582] {aka AZI, AZI1, AZIA1, OAZI, OAZIN, ODC1L}, DCTN6 (dynactin subunit 6) [NCBI Gene 10671] {aka WS-3, WS3, p27}
- **Diseases:** glioblastoma (MESH:D005909), VAF (MESH:D006316), TSCS (MESH:D012640), liver cancer (MESH:D006528), esophageal carcinoma (MESH:D004938), tumorigenesis (MESH:D063646), gastric cancer (MESH:D013274), Cancer (MESH:D009369), breast (MESH:D061325), CRC (MESH:D015179), injury to (MESH:D014947)
- **Chemicals:** ethanol (MESH:D000431), poly(A) (MESH:D011061), B (MESH:D001895), sodium acetate (MESH:D019346), thiourea (MESH:D013890), DTT (MESH:D004229), SDS (MESH:D012967), isoamyl alcohol (MESH:C029683), isopropanol (MESH:D019840), cycloheximide (MESH:D003513), chloroform (MESH:D002725), sucrose (MESH:D013395), iodoacetamide (MESH:D007460), TRIzol (MESH:C411644), phenol (MESH:D019800), CO2 (MESH:D002245), acetonitrile (MESH:C032159), Triton X-100 (MESH:D017830), streptomycin (MESH:D013307), urea (MESH:D014508), formic acid (MESH:C030544), penicillin (MESH:D010406), DMEM (-), TE (MESH:D013691)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** HCT15 — Homo sapiens (Human), Colon adenocarcinoma, Cancer cell line (CVCL_0292), HCT116 — Homo sapiens (Human), Colon carcinoma, Cancer cell line (CVCL_0291), LoVo — Homo sapiens (Human), Colon adenocarcinoma, Cancer cell line (CVCL_0399), CCL- — Mus musculus (Mouse), Undefined cell line type (CVCL_M023), SW620 — Homo sapiens (Human), Colon adenocarcinoma, Cancer cell line (CVCL_0547), SW480 — Homo sapiens (Human), Colon adenocarcinoma, Cancer cell line (CVCL_0546), S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12937670/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12937670/full.md

## References

71 references — full list in the complete paper: https://tomesphere.com/paper/PMC12937670/full.md

---
Source: https://tomesphere.com/paper/PMC12937670