# Large-scale RNA-Seq Transcriptome Analysis of 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types

**Authors:** Li Peng, Xiu Wu Bian, Di Kang Li, Chuan Xu, Guang Ming Wang, Qing You Xia, Qing Xiong

PMC · DOI: 10.1038/srep13413 · 2015-08-21

## TL;DR

This study analyzed RNA-Seq data from thousands of cancer and normal tissue samples to identify gene expression patterns that distinguish cancers from normal tissues and across cancer types.

## Contribution

The study introduces seven cross-cancer gene signatures and a lung cancer-specific gene signature with high diagnostic accuracy.

## Key findings

- A 14-gene signature accurately differentiates cancerous from normal samples with high predictive accuracy.
- A lung cancer-specific gene signature distinguishes lung cancer from other cancers with 100% accuracy in one dataset.
- The gene signatures reveal transcriptional programs linked to cancer development and progression.

## Abstract

The Cancer Genome Atlas (TCGA) has accrued RNA-Seq-based transcriptome data for more than 4000 cancer tissue samples across 12 cancer types, translating these data into biological insights remains a major challenge. We analyzed and compared the transcriptomes of 4043 cancer and 548 normal tissue samples from 21 TCGA cancer types, and created a comprehensive catalog of gene expression alterations for each cancer type. By clustering genes into co-regulated gene sets, we identified seven cross-cancer gene signatures altered across a diverse panel of primary human cancer samples. A 14-gene signature extracted from these seven cross-cancer gene signatures precisely differentiated between cancerous and normal samples, the predictive accuracy of leave-one-out cross-validation (LOOCV) were 92.04%, 96.23%, 91.76%, 90.05%, 88.17%, 94.29%, and 99.10% for BLCA, BRCA, COAD, HNSC, LIHC, LUAD, and LUSC, respectively. A lung cancer-specific gene signature, containing SFTPA1 and SFTPA2 genes, accurately distinguished lung cancer from other cancer samples, the predictive accuracy of LOOCV for TCGA and GSE5364 data were 95.68% and 100%, respectively. These gene signatures provide rich insights into the transcriptional programs that trigger tumorigenesis and metastasis, and many genes in the signature gene panels may be of significant value to the diagnosis and treatment of cancer.

## Linked entities

- **Genes:** SFTPA1 (surfactant protein A1) [NCBI Gene 653509], SFTPA2 (surfactant protein A2) [NCBI Gene 729238]
- **Diseases:** BLCA (MONDO:0005611), BRCA (MONDO:0006256), COAD (MONDO:0002271), lung cancer (MONDO:0005138)

## Full-text entities

- **Genes:** GIMAP8 (GTPase, IMAP family member 8) [NCBI Gene 155038] {aka IAN-9, IAN6, IAN9, IANT}, DNMT1 (DNA methyltransferase 1) [NCBI Gene 1786] {aka ADCADN, AIM, CXXC9, DNMT, HSN1E, MCMT}, sub (subito) [NCBI Gene 44870] {aka CG12298, DmKlp54E, DmSub, Dmel\CG12298, Dub, KIF 20A}, GIMAP6 (GTPase, IMAP family member 6) [NCBI Gene 474344] {aka IAN-2, IAN-6, IAN2, IAN6}, SDC1 (syndecan 1) [NCBI Gene 6382] {aka CD138, SDC, SYND1, syndecan}, MKI67 (marker of proliferation Ki-67) [NCBI Gene 4288] {aka KIA, MIB-, MIB-1, PPP1R105}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, SFTPA2 (surfactant protein A2) [NCBI Gene 729238] {aka COLEC5, ILD2, PSAP, PSP-A, PSPA, SFTP1}, ADRB2 (adrenoceptor beta 2) [NCBI Gene 154] {aka ADRB2R, ADRBR, ARB2, B2AR, BAR, BETA2AR}, IL33 (interleukin 33) [NCBI Gene 90865] {aka C9orf26, DVS27, IL1F11, NF-HEV, NFEHEV}, RRM1 (ribonucleotide reductase catalytic subunit M1) [NCBI Gene 6240] {aka PEOB6, R1, RIR1, RR1}, FANCI (Fanconi anemia complementation group I) [NCBI Gene 35895] {aka CG13745, Dmel\CG13745}, RRM2 (ribonucleotide reductase regulatory subunit M2) [NCBI Gene 6241] {aka C2orf48, R2, RR2, RR2M}, ABRA (actin binding Rho activating protein) [NCBI Gene 137735] {aka STARS}, TYMS (thymidylate synthetase) [NCBI Gene 7298] {aka DKCD, HST422, TMS, TS}, CIP2A (cellular inhibitor of PP2A) [NCBI Gene 57650] {aka KIAA1524, NOCIVA, p90}, CLDN18 (claudin 18) [NCBI Gene 51208] {aka SFTA5, SFTPJ}, GJC1 (gap junction protein gamma 1) [NCBI Gene 10052] {aka CX45, GJA7}, FOXM1 (forkhead box M1) [NCBI Gene 2305] {aka FKHL16, FOXM1A, FOXM1B, FOXM1C, HFH-11, HFH11}, PSRC1 (proline and serine rich coiled-coil 1) [NCBI Gene 84722] {aka DDA3, FP3214}, SKA1 (spindle and kinetochore associated complex subunit 1) [NCBI Gene 220134] {aka C18orf24}, CCNB1 (cyclin B1) [NCBI Gene 891] {aka CCNB}, HMMR (hyaluronan mediated motility receptor) [NCBI Gene 3161] {aka CD168, IHABP, RHAMM}, BRIP1 (BRCA1 interacting DNA helicase 1) [NCBI Gene 83990] {aka BACH1, FANCJ, OF}, ABCA3 (ATP binding cassette subfamily A member 3) [NCBI Gene 21] {aka ABC-C, ABC3, EST111653, LBM180, SMDP3}, RB1 (RB transcriptional corepressor 1) [NCBI Gene 5925] {aka OSRC, PPP1R130, RB, p105-Rb, p110-RB1, pRb}, WNT7A (Wnt family member 7A) [NCBI Gene 7476] {aka SANTOS, Wnt-7a}, AURKA (aurora kinase A) [NCBI Gene 6790] {aka AIK, ARK1, AURA, BTAK, PPP1R47, STK15}, FDXR (ferredoxin reductase) [NCBI Gene 2232] {aka ADR, ADXR, ANOA, MMDS9B}, UHRF1 (ubiquitin like with PHD and ring finger domains 1) [NCBI Gene 29128] {aka ICBP90, Np95, RNF106, TDRD22, hNP95, hUHRF1}, SON (SON DNA and RNA binding protein) [NCBI Gene 6651] {aka BASS1, C21orf50, DBP-5, NREBP, SON3, TOKIMS}, IFNAR2 (interferon alpha and beta receptor subunit 2) [NCBI Gene 3455] {aka IFN-R, IFN-R-2, IFN-alpha-REC, IFNABR, IFNARB, IMD45}, ABCA1 (ATP binding cassette subfamily A member 1) [NCBI Gene 19] {aka ABC-1, ABC1, CERP, HDLCQTL13, HDLDT1, HPALP1}, DONSON (DNA replication fork stabilization factor DONSON) [NCBI Gene 29980] {aka B17, C21orf60, MGORS10, MIMIS, MISSLA}, TK1 (thymidine kinase 1) [NCBI Gene 7083], HDGF (heparin binding growth factor) [NCBI Gene 3068] {aka HMG1L2}, UBE2C (ubiquitin conjugating enzyme E2 C) [NCBI Gene 11065] {aka UBCH10, dJ447F3.2}, ROS1 (ROS proto-oncogene 1, receptor tyrosine kinase) [NCBI Gene 6098] {aka MCF3, ROS, c-ros-1}, CDC6 (cell division cycle 6) [NCBI Gene 990] {aka CDC18L, HsCDC18, HsCDC6, MGORS5}, Klp3A (Kinesin-like protein at 3A) [NCBI Gene 31240] {aka 3A7Kin, BcDNA:LD21815, CG8590, DmKLP3A, DmKlp3A, Dmel\CG8590}, moi (modigliani) [NCBI Gene 7354473] {aka 0967/13, CG31241, CG42350, DTL, DTL/Moi, DTLu}, TOP2A (DNA topoisomerase II alpha) [NCBI Gene 7153] {aka TOP2, TOP2alpha, TOPIIA, TP2A}, CDKN3 (cyclin dependent kinase inhibitor 3) [NCBI Gene 1033] {aka CDI1, CIP2, KAP, KAP1}, Gprc5a (G protein-coupled receptor, family C, group 5, member A) [NCBI Gene 232431] {aka Rai3, Raig1}, BIRC5 (baculoviral IAP repeat containing 5) [NCBI Gene 332] {aka API4, EPR-1}, EZH2 (enhancer of zeste 2 polycomb repressive complex 2 subunit) [NCBI Gene 2146] {aka ENX-1, ENX1, EZH2b, KMT6, KMT6A, WVS}, CENPH (centromere protein H) [NCBI Gene 64946], SKP2 (S-phase kinase associated protein 2) [NCBI Gene 6502] {aka FBL1, FBXL1, FLB1, p45}, FGFR1 (fibroblast growth factor receptor 1) [NCBI Gene 2260] {aka BFGFR, CD331, CEK, ECCL, FGFBR, FGFR-1}, AURKB (aurora kinase B) [NCBI Gene 9212] {aka AIK2, AIM-1, AIM1, ARK-2, ARK2, AurB}, E2F7 (E2F transcription factor 7) [NCBI Gene 144455], Fen1 (Flap endonuclease 1) [NCBI Gene 36887] {aka CG8648, DmFEN-1, Dmel\CG8648, EG:EG0003.3}, PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha) [NCBI Gene 5290] {aka CCM4, CLAPO, CLOVE, CWS5, HMH, MCAP}, CCNE1 (cyclin E1) [NCBI Gene 898] {aka CCNE, pCCNE1}, BRCA1 (BRCA1 DNA repair associated) [NCBI Gene 672] {aka BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4}, KAT5 (lysine acetyltransferase 5) [NCBI Gene 10524] {aka ESA1, HTATIP, HTATIP1, NEDFASB, PLIP, TIP}, NPHP1 (nephrocystin 1) [NCBI Gene 4867] {aka JBTS4, NPH1, SLSN1}, RAD51 (RAD51 recombinase) [NCBI Gene 5888] {aka BRCC5, FANCR, HRAD51, HsRad51, HsT16930, MRMV2}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, GPRC5A (G protein-coupled receptor class C group 5 member A) [NCBI Gene 9052] {aka GPCR5A, PEIG-1, RAI3, RAIG1, TIG1}
- **Diseases:** ovarian cancer (MESH:D010051), kidney renal clear cell carcinoma (MESH:D002292), colon (MESH:D003108), nephronophthisis (MESH:C537699), thyroid (MESH:D013966), Adenocarcinoma (MESH:D000230), bladder cancer (MESH:D001749), kidney cystic disease (MESH:D052177), liver metastasis (MESH:D009362), Carcinogenesis (MESH:D063646), urothelial carcinomas (MESH:D014523), Retinoblastoma (MESH:D012175), Ciliary Motility Disorders (MESH:D002925), liver (MESH:D017093), TNBC (MESH:D064726), kidney cancers (MESH:D007680), lung inflammation (MESH:D011014), Cancer (MESH:D009369), acute and chronic lung disease (MESH:D055370), Viral Infections (MESH:D014777), non-small cell lung cancer (MESH:D002289), breast cancer (MESH:D001943), head and neck squamous cell carcinoma (MESH:D000077195), COAD (MESH:D029424), Pancreatic Diseases (MESH:D010182), lung squamous cell carcinoma (MESH:D002294), Respiratory Tract Diseases (MESH:D012140), LUAD lung adenocarcinoma (MESH:D000077192), colon adenocarcinoma (MESH:D003110), colorectal cancer (MESH:D015179), Lung diseases (MESH:D008171), Urogenital Abnormalities (MESH:D014564), Cross (MESH:C537866), pulmonary alveolar microlithiasis (MESH:C562405), bladder abnormalities (MESH:D001745), THCA thyroid carcinoma (MESH:D013964), carcinogenic (MESH:D011230), Lung Neoplasms (MESH:D008175), prostate cancer (MESH:D011471), FA (MESH:D005199), Cystitis (MESH:D003556), liver cancer (MESH:D006528), breast (MESH:D061325), Airway Obstruction (MESH:D000402), liver tumor (MESH:D008113), pulmonary fibrosis (MESH:D011658)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090], Drosophila melanogaster (fruit fly, species) [taxon 7227]
- **Mutations:** Asp(358)Ala
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC4544034/full.md

---
Source: https://tomesphere.com/paper/PMC4544034