# To be or not to be a protein coding mutation, that’s the question!

**Authors:** Dylan De Groote, Daniele Pepe, Xander Janssens, Kim De Keersmaecker

PMC · DOI: 10.1093/nargab/lqaf168 · NAR Genomics and Bioinformatics · 2025-11-21

## TL;DR

This paper discusses how misannotating genetic variants as protein-coding or noncoding can lead to errors in understanding their role in disease.

## Contribution

The paper introduces a method using RNA sequencing data to accurately annotate genetic variants based on expressed transcripts in specific tissues.

## Key findings

- Misannotations occur when variants are annotated to canonical transcripts instead of expressed ones.
- RNA sequencing data can reveal which transcripts are expressed in specific tissues.
- Integrating DNA and RNA sequencing improves the functional interpretation of genetic variants.

## Abstract

Accurate annotation of genetic variants—distinguishing whether they affect protein-coding or noncoding genomic regions—is crucial for evaluating their potential role in disease development. Prominent examples have been identified of variants that for many years had been considered to be coding missense or synonymous mutations targeting one gene, and that recently turned out to be noncoding variants, sometimes even modulating a shared regulatory region of multiple genes. These errors were caused by annotating to a canonical reference transcript, whereas an alternative transcript was in reality expressed in respect to which the mutations have a different annotation. Unfortunately, this practice of annotating genetic variants to a reference transcript, without verifying whether this transcript is expressed or whether the mutation causes a change of expressed transcript, is still widespread. However, the implementation of RNA sequencing and availability of these data in online portals allow to verify expressed transcripts in relevant tissues. Integration of DNA- and RNA-sequencing data, in which detected DNA mutations are annotated in respect to the transcripts that are expressed in the corresponding tissue or disease sample as detected by RNA sequencing, avoids misinterpretation of noncoding variants as coding and vice versa, thereby improving the functional interpretation of genetic variants.

Graphical Abstract

## Full-text entities

- **Genes:** BAP1 (BRCA1 associated deubiquitinase 1) [NCBI Gene 8314] {aka HUCEP-13, KURIS, TPDS1, UBM2, UCHL2, UVM2}, KNSTRN (kinetochore localized astrin (SPAG5) binding protein) [NCBI Gene 90417] {aka C15orf23, HSD11, ROCHIS, SKAP, TRAF4AF1}, DXO (decapping exoribonuclease) [NCBI Gene 1797] {aka DOM3L, DOM3Z, NG6}, LGR5 (leucine rich repeat containing G protein-coupled receptor 5) [NCBI Gene 8549] {aka FEX, GPR49, GPR67, GRP49, HG38}, WHR1 (winged helix repair factor 1) [NCBI Gene 8859] {aka D6S60, D6S60E, G11, HLA-RP1, RP1, STK19}, KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845] {aka 'C-K-RAS, C-K-RAS, CFC2, K-RAS2A, K-RAS2B, K-RAS4A}, BCL2L12 (BCL2 like 12) [NCBI Gene 83596], WNK1 (WNK lysine deficient protein kinase 1) [NCBI Gene 65125] {aka HSAN2, HSN2, KDP, PPP1R167, PRKWNK1, PSK}, TERT (telomerase reverse transcriptase) [NCBI Gene 7015] {aka CMM9, DKCA2, DKCB4, EST2, PFBMFT1, TCS1}, IRF3 (interferon regulatory factor 3) [NCBI Gene 3661] {aka IIAE7}, CDKN1A (cyclin dependent kinase inhibitor 1A) [NCBI Gene 1026] {aka CAP20, CDKN1, CIP1, MDA-6, P21, SDI1}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, NRAS (NRAS proto-oncogene, GTPase) [NCBI Gene 4893] {aka ALPS4, CMNS, N-ras, NCMS, NRAS1, NS6}, SCN1A (sodium voltage-gated channel alpha subunit 1) [NCBI Gene 6323] {aka DEE6, DEE6A, DEE6B, DRVT, EIEE6, FEB3}, STK11 (serine/threonine kinase 11) [NCBI Gene 6794] {aka LKB1, PJS, hLKB1}, COL4A3 (collagen type IV alpha 3 chain) [NCBI Gene 1285] {aka ATS2, ATS3, ATS3A, ATS3B, BFH2}
- **Diseases:** kidney renal clear cell carcinoma (MESH:D002292), aneuploidy (MESH:D000782), glioblastoma (MESH:D005909), Alport syndrome (MESH:D009394), epilepsy (MESH:D004827), genetic disease (MESH:D030342), hereditary kidney disorder (MESH:D007680), SCCs (MESH:D002294), melanoma skin cancers (MESH:D012878), carcinogenesis (MESH:D063646), Cancer (MESH:D009369), head and neck cancer (MESH:D006258), melanoma (MESH:D008545)
- **Chemicals:** GDP (MESH:D006153), GTP (MESH:D006160)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]
- **Mutations:** 49665874 C > T, G12D, 1221013 G > A, 166043691 A > G, 40382906 C > T, 52408496 T > C, p.Asp674Gly, 40382931G > A, p.Asp89Asn, 31972346 C > T, p.Thr255Thr
- **Cell lines:** Mel-ST — Homo sapiens (Human), Melanoma, Cancer cell line (CVCL_7145)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12634407/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12634407/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12634407/full.md

---
Source: https://tomesphere.com/paper/PMC12634407