Transcriptomic Profile of Oral Cancer Lesions: A Proof-of-Concept Pilot Study of FFPE Tissue Sections
Madison E. Richards, Micaela F. Beckman, Ernesto Martinez Duarte, Joel J. Napenas, Michael T. Brennan, Farah Bahrani Mougeot, Jean-Luc C. Mougeot

TL;DR
This study uses preserved tissue samples to compare gene activity in oral cancer and breast cancer, identifying unique gene patterns in oral cancer.
Contribution
The study demonstrates the feasibility of using FFPE samples to identify OSCC-specific transcriptomic profiles and potential biomarkers.
Findings
OSCC and BC FFPE samples showed distinct clustering in PCA and had 9194 differentially expressed genes.
Downregulated KEGG pathways in OSCC included leukocyte and apoptosis-related processes, while upregulated genes involved PIK3 and NF-kappaB signaling.
KRT-family genes and periodontal pathogens were suggested as potential contributors to the progression from OPMD to OSCC.
Abstract
Oral squamous cell carcinoma (OSCC) is a malignancy that affects the oral mucosa and is characterized by indurated oral lesions. The RNAseq of formalin-fixed, paraffin-embedded (FFPE) samples is readily available in clinical settings. Such samples have long-term preservation and can provide highly accurate transcriptomic information regarding gene fusions, isoforms, and allele-specific expression. We determined differentially expressed genes using the transcriptomic profiles of oral potentially malignant disorder (OPMD) FFPE oral lesion samples of patients who developed OSCC over years. A technical comparison was completed comparing breast cancer (BC) FFPE publicly available data in this proof-of-concept pilot study. OSCC FFPE samples were collected from patients (N = 3) who developed OSCC 3 to 5 years following OPMD diagnosis (n = 3) and were analyzed using RNAseq. RNAseq sequences…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —Atrium Health Foundation Research fund and Comprehensive Cancer Institute
- —Wake Forest Baptist Comprehensive Cancer Center’s NCI Cancer Center
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOral Health Pathology and Treatment · Cancer-related molecular mechanisms research · Head and Neck Cancer Studies
1. Introduction
Oral squamous cell carcinomas (OSCCs) originate for the most part (~80%) from oral potentially malignant disorders (OPMDs). OPMDs have been characterized by lesions with a high risk of malignant transformation [1,2]. OPMDs include oral lichen planus, leukoplakia, proliferative verrucous leukoplakia, oral graft versus host disease, oral submucous fibrosis, and erythroplakia. Approximately 4.5% of the world’s population may have OPMDs with men being more frequently affected, likely due to increased use of tobacco and alcohol compared to women [3]. Other risk factors such as the use of betel nut derivatives, human papilloma virus (HPV) infection, oral microbiome dysbiosis, ill-fitting dentures, genetic alterations, compromised epigenetic regulation, dysregulated tumor microenvironment, or combinations of these factors may contribute to the progression of OPMDs to oral squamous cell carcinoma (OSCC) [4,5,6,7,8,9].
A common theme among OPMDs moving towards malignant transformation includes the loss of heterozygosity at chromosomal loci 3, 9, and/or 17 [10,11,12,13,14]. The loss of genetic material at these genomic regions implicates the early markers of oral carcinogenesis while loss of genetic material at chromosomes 8 and 13 are associated with late-stage carcinoma [15]. Furthermore, tissue markers such as p53, EGFR, PD-L1, CD4+, CD8+ T cell, TLR-2, TNF-α, IL-6, COX-2, CD34, TGF-β, and Mcm2 have been implicated for their involvement in OPMDs to cancer transition [9,11]. We have previously identified Haemophilus pittmaniae and Leptotrichia spp. as a multi-marker signature in a cohort of HPV positive head and neck cancer (HNC) patients. This finding suggests that oral bacterial species may coexist with HPV within HPV-induced oral lesions in HNC patients, contributing to the transition to OSCC [6].
Current treatments for OPMDs include the mitigation of risk factors and the removal of moderate to severe lesions and chemoprevention [8]. Less invasive approaches utilize anti-inflammatory drugs or topical medications [12,16]. The early screening of OPMDs is vital for timely diagnosis to minimize malignant transformation, yet procedures rely on the visual exploration of lesions. Furthermore, clinical approaches of suspicious lesions may include screening aids such as vital staining with toluidine blue and Lugol’s iodine, autofluorescence, chemiluminescence, narrow-band imaging, high-frequency ultrasounds, and biomarker assessment from saliva, serum, or exfoliated cells [12,17,18,19,20,21,22,23,24]. There is currently no methodology to predict the likelihood of transformation of an individual lesion to OSCC.
By discovering genetic biomarkers suitable to predict or pinpoint the stage of a lesion’s progression towards OSCC, new opportunities for the development of treatment strategies of OPMDs may arise. With the prospect of investigating OPMD lesions in future studies to generate a predictor algorithm, using the latest RNAseq technology, we first sought to determine transcriptomic profiles of oral lesions’ biopsies in patients that developed OSCC. Thus, we implemented an initial proof-of-concept pilot study of OSCC samples (N = 3). As an unrelated control group, we compared our OSCC data to transcriptomic RNAseq data obtained from the formalin-fixed paraffin-embedded samples of unrelated breast cancer biopsies (N = 6) and yielded OSCC relevant findings, thereby confirming that a proper methodology was used.
2. Results
The demographic information for OSCC patients (N = 3) is presented in Table 1. The overall analytical pipeline of the study is presented in Figure 1. The pathological features of the three patients are shown in Supplemental Figure S1.
RNAseq for three OSCC and six BC formalin-fixed paraffin embedded (FFPE) samples were obtained at an average read depth of 16.2 and 43.3 million reads per sample with an average unique mapping of 82.44% and 79.75%, respectively. The STARv2.7.9a ‘genecounts’ module detected 27,237 and 30,343 genes among the OSCC and BC groups, respectively. DESeq2v1.40.2 determined that 9194 genes were differentially expressed, with 4466 being upregulated (OSCC > BC) and 4728 being downregulated (BC > OSCC) (padj < 0.05). Filtering results by restricting log_2_FoldChange (log_2_FC) to less than −2.0 and greater than 2.0 resulted in 3319 remaining genes with 1271 being upregulated (OSCC > BC) and 2048 being downregulated (BC > OSCC). A volcano plot showing significant genes is presented in Figure 2. Upregulated genes included KRT6B, SERPINB5, DSC3, and PERP, and KRT5 (log_2_FC > 5.0; padj < 10^−200^) (Table 2a). Top downregulated genes included KRT19, GREB1, ARFGEF3, SERPINA3, LONRF2 (log_2_FC < −4.0; padj < 10^−80^) (Table 2b). A list of all differentially expressed genes can be found in Supplemental File S1.
Principal Component Analysis (PCA) resulted in the first principal component being responsible for 87% of variance between OSCC and BC FFPE transcriptomic profiles. PCA also showed BC and OSCC samples clustered into distinct groups (Figure 3). Pathviewv.1.40.0 determined 17 downregulated Kyoto Encyclopedia of Gene and Genomes (KEGG) pathways in the OSCC group compared to the BC group. Pathways included chemokine signaling, natural killer cell mediated cytotoxicity, and NOD-like receptor signaling pathway. Pathviewv.1.40.0 was unable to identify any upregulated KEGG pathways. All 17 significant pathways and their involved genes are presented in Table 3. The Pathviewv.1.40.0 rendering of the top five most significant pathways can be found in Supplemental Figure S2a–e. The top significant pathways were ‘Chemokine signaling pathway’, ‘natural killer cell mediated cytotoxicity’, ‘NOD-like receptor signaling pathway’, ‘RIG-I-like receptor signaling pathway’, and ‘arginine and proline metabolism’.
From an input of 25 upregulated genes appearing in more than one KEGG pathway, 23 genes were returned in the protein-protein interaction (PPI) by the Search Tool for the Retrieval of Interacting Genes (STRINGv12.0) (p < 1 × 10^−16^). Enriched Gene Ontology Biological Processes (GO BPs) included multiple terms related to leukocytes and apoptosis as well as cell regulation and signaling (FDR < 1 × 10^−5^) (Figure 4a). Furthermore, 63 genes were returned in the PPI from an input of 73 downregulated genes appearing in more than one KEGG pathway (p < 1 × 10^−16^). The top GO BP was determined to be NIK/NF-kappaB signaling (FDR = 1.19 × 10^−9^) with other significant GO BPs involving metabolic responses and signaling from external stimuli (Figure 4b).
3. Discussion
In this proof-of-concept pilot study, we were able to distinguish the transcriptomic profiles of OSCC patients’ oral lesions compared to BC tumors using RNA-seq of FFPE blocks based on data generated in our laboratory and data obtained from the Gene Expression Omnibus NCBI NIH database (GEO; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58135, accessed on 24 October 2024) that could be appropriately normalized for comparison. KRT genes have been associated with a multitude of cancers [25]. Our results showed KRT genes as the most significant up- and downregulated genes among the filtered gene set (Table 2). A total of eight KRT genes were found to be downregulated while 20 were found to be upregulated after filtering (Supplemental File S1). KRT6B was the most significantly upregulated gene among the OSCC group. Another type II cytokeratin, KRT5, has been previously associated with squamous cell carcinoma (SNP IDs rs11170164—chromosome 3:188370473 and rs607860) [26,27]. In our study, KRT5 was determined as upregulated with p = 5.16 × 10^−236^ and a log_2_FC > 5.0 (Figure 2, Table 2). The type I cytokeratin, KRT19, was the most significantly downregulated gene (p = 3.07 × 10^−92^; log_2_FC < −9.0). KRT19 has been associated with the progression of dysplasia in leukoplakia [28]. KRT genes belong to the keratin gene family and are responsible for maintaining cellular integrity. A 2019 study identified KRT31, 37, 76 as significantly different between leukoplakia groups with and without dysplasia [29]. No KRT genes were found in any differential KEGG pathways in this study (Table 3).
We identified 17 significant KEGG pathways involving genes expressed in our OSCC samples (Table 3). All significant pathways were downregulated when compared to BC samples. Pathways included chemokine signaling, natural killer cell mediated cytotoxicity, apoptosis, and others known to be involved in both BC and OSCC progression. PIK3CB was present in 4 of the 17 (~24%) pathways and PIK3 genes were tightly clustered within the upregulated STRINGv12.0 PPI network (Table 3 and Figure 4a). PIK3CB is a kinase important for signaling to receptors on the outer membranes of eukaryotic cells. PIK3CB can also activate neutrophils during injury or infection. Additionally, genes belonging to the PIK3 family of genes have been reported as the most frequently mutated oncogenes in human cancer [30]. The upregulation of PIK3CB in multiple pathways and the tight grouping of many PIK3 genes within the PPI network indicate a possible increase in cell proliferation, which is characteristic of the OPMD to OSCC transition and BC transformation [31,32]. The many connections of CHUK and NFKB1 to other genes within the STRINGv12.0 PPI network of downregulated genes demonstrate this complex relationship (Figure 4b).
There may also be external factors influencing the immune response leading to the progression of BC or OSCC with significant GO BPs such as the ‘positive regulation of macrophage derived from cell differentiation’, ‘cellular response to lipopolysaccharide’ and ‘cellular response to lipopolysaccharide’. Lipopolysaccharides, produced by oral bacteria such as Porphyromonas gingivalis, Prevotella intermedia, and Fusobacterium nucleatum can activate macrophages leading to the production of inflammatory cytokines causing an immune response and tissue damage [33,34,35]. In OSCC, P. gingivalis has been shown to aid in the progression of cancer by (i) activating the expression of NF-kappaB and MAPK pathways, (ii) inhibiting apoptosis by activating jAk/stat and P13K/Akt, (iii) promoting angiogenesis by increasing expression of EENB2, (iv) increasing cell proliferation via PDCD4 inhibition, increasing AP1 and CD1, and (v) evading the host immune system through the production of butyric acid causing T-cell and B-cell apoptosis to ensure its survival on gingival epithelial cells [34]. The increased abundances of periodontal pathogens in OPMDs have been reported but more longitudinal studies at the species level are needed to clarify the mechanisms between the bacterial relationship and OPMD to OSCC transition and the potential for therapeutic interventions [36,37].
Although Pathviewv.1.40.0 and Gagev2.50.0-determined CHUK and NFKB1 were downregulated in multiple pathways, DESeq2v1.40.2 analysis determined their expression to be 0.72 and 0.88 log_2_FC higher in OSCC samples than BC samples, respectively. This is likely due to Pathviewv.1.40.0 visualizing the net effect of multiple genes in a pathway. In other words, although CHUK and NFKB1 are shown to be upregulated by DESeq2v1.40.2, an inhibitor of these genes upstream with higher expression may cancel out that effect, causing them to be visualized as downregulated in the pathway. CHUK and NFKB1 are engaged in a complex crosstalk linked to cancer progression and drug resistance due to their ability to activate inflammatory responses and promote cell survival [38,39].
Although our analysis compared OSCC to BC samples, 19 significant genes overlapped with Farah and Fox’s 47 differentially expressed genes (DEGs), including the upregulation of ODC1 and LCN2 and the downregulation of COL1A1, COL11A1, and STAC2 in dysplastic leukoplakia samples (Supplemental File S1) [29]. Furthermore, a study investigating the RNAseq of human tongue OSCCs vs. healthy tongue epithelia in the same 20 patients had 2543 overlapping genes with our DEGs prior to filtering [40]. Of the 2543 genes, 1397 (~55%) showed similar up (n = 780) /down (n = 617) regulation [40]. We also found 1554 DEGs in common (~38%) prior to filtering, with a 2015 study by Conway et al. comparing the FFPE sections of tumors from OSCC patients compared to the adjacent healthy tissues of 19 patients [41]. Over 25% of these genes also had similar up- (n = 126) and downregulation (n = 275). Despite these studies having significantly more samples than our study, we were able to confirm similar results, suggesting that the OSCC transcriptomic profiles are highly conserved across different patient populations.
Limitations
The sample size in this pilot study is small. While we met the DESeq2v1.40.2 recommendation of having at least three samples per condition, a larger sample size would reduce the chance of false positives. While FFPE samples from breast cancer tumors functioned as a way to complete a technical reference comparison, comparing OSCC samples to healthy oral mucosa or premalignant lesions would be ideal; however, publicly available data from FFPE samples using Illumina for RNAseq are scarce. In future studies, we will compare the transcriptomic profiles of OPMD (early/pre-cancerous) lesions of FFPE samples from patients who developed OSCC to those who did not. The microbial profiles of patients in conjunction with transcriptomic profiles may be further investigated longitudinally to confirm the involvement of P. gingivalis and other periodontal pathogens. This would allow us to design an algorithm intended for the prediction of the OPMD to OSCC transition using transcriptomic and/or microbial shifts in patients.
4. Materials and Methods
4.1. Sample Collection and Patient Characteristics
Formalin-fixed and paraffin-embedded (FFPE) OSCC oral lesions samples were obtained from the Atrium Health Biospecimen Repository, Atrium Health, Charlotte, NC, USA associated with multiple clinical studies in which OSCC patients have consented for genomic analyses. The study was approved by the Wake Forest University Institutional Review Board (IRB00109068) and qualified for expedited review under the Federal Regulations [45CFR46.110]. Cases were identified by retrospective review. Patients (N = 3) who developed OSCC 3–5 years after initial presentation with OPMD oral lesions were identified through a patient record review and selected without a priori gender or race bias.
4.2. Initial Processing of FFPE Slides
Hematoxylin and eosin (H&E)-stained slides were independently reviewed by a single pathologist and classified in the low-grade dysplasia (LGD), moderate dysplasia (MD), high-grade dysplasia (HGD), squamous cell carcinoma (SCC), or no dysplasia/carcinoma categories, according to the most recent pathologic classification of oral cavity dysplasia by the World Health Organization (El-Naggar et al., 2022) [42]. The H&E slides with relevant findings (OSCC for the purposes of this study) were marked to indicate sections containing tumor for RNA extraction from unstained sections.
4.3. RNA Extraction
Each sample was delivered as 15 FFPE sections mounted on individual slides and an accompanying H&E slide. A number 11 scalpel blade with a number 3 handle was used to remove the paraffin surrounding the tissue area of interest. A new blade was then used to scrape the tissue of interest into a 1.5 mL microfuge tube. Five to fifteen slides were used for each sample. RNA was extracted using the Quick-RNA FFPE kit (Zymo Research, Irvine, CA, USA) and processed using the standard protocol except 2 volumes of 100% ethanol were used to increase the recovery of small fragment RNA. RNA was DNase-treated and purified using the RNA Clean and Concentrator-5 kit (Zymo Research, Irvine, CA, USA) and assessed for RNA quality using an Agilent 4200 TapeStation and the Standard RNA Assay Kit (Agilent Technologies, Santa Clara, CA, USA).
4.4. Bulk RNAseq Sequencing Method
Total RNA was used to prepare cDNA libraries using the Illumina^®^ TruSeq^®^ Stranded Total RNA Library Prep Globin (Illumina Inc., San Diego, CA, USA). RIN values for the RNA samples were quality assessed on an Agilent TapeStation (Agilent Technologies, Santa Clara, CA, USA). Briefly, 750 ng of total RNA was rRNA depleted followed by enzymatic fragmentation, reverse-transcription, and double-stranded cDNA purification using AMPure XP magnetic beads (Beckman Coulter, Inc., Brea, CA, USA). The cDNA was end repaired, 3′ adenylated, with Illumina sequencing adaptors ligated onto the fragment ends, and the stranded libraries were pre-amplified with PCR. The library size distribution was validated and quality inspected using an Agilent TapeStation (Agilent Technologies, Santa Clara, CA). The quantity of each cDNA library was measured using the Qubit 3.0 (Thermo Fisher Scientific, Waltham, MA, USA). The libraries were pooled and sequenced on the Illumina NextSeq2000, San Diego, CA, USA (or Illumina NovaSeq, San Diego, CA, USA).
4.5. Bioinformatics Analysis
Transcriptomic sequencing data for breast cancer samples to be used for technical comparison were obtained from the publicly available Gene Expression Omnibus (GEO; GSE58135; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58135, accessed on 24 October 2024) database for immunohistochemistry confirmed estrogen receptor-positive breast cancer FFPE samples (N = 6) (GSE58135) [43,44]. Adapters were trimmed from all samples (OSCC = 3; BC = 6) and aligned to the human reference genome (GRCh38.p13) using Spliced Transcripts Alignment to a Reference (STARv2.7.9a) [45]. The ‘genecounts’ module within STARv2.7.9a was utilized for counting genes [44]. Pythonv3.12.4 was used to merge the OSCC count data with BC count data. DESeq2v1.40.2 was used in Rv4.3.0 to compare FFPE gene counts of OSCCs to BC samples. PCA was completed on gene counts and plotted in Rv4.3.0 using ggplot2v3.5.1 to show the grouping of OSCC samples compared to BC samples.
The Gagev2.50.0 and Pathviewv.1.40.0 libraries were used within Rv4.3.0 to investigate the differential KEGG pathways [46]. STRINGv12.0 was used to determine PPIs of up- or downregulated genes appearing in more than one differential KEGG pathway at the highest confidence level and the corresponding GO BP enrichment [47].
5. Conclusions
There is a clear distinction in the transcriptomic profiles of FFPE samples in the lesions of patients that developed OSCC compared to the FFPE samples of breast cancer patients. Genes belonging to the KRT family may be further investigated for their involvement in the OPMD to OSCC transition.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Kramer I.R. Lucas R.B. Pindborg J.J. Sobin L.H. Definition of leukoplakia and related lesions: An aid to studies on oral precancer Oral Surg. Oral Med. Oral Pathol.197846518539280847 · pubmed ↗
- 2Gupta P.C. Bhonsle R.B. Murti P.R. Daftary D.K. Mehta F.S. Pindborg J.J. An epidemiologic assessment of cancer risk in oral precancerous lesions in India with special reference to nodular leukoplakia Cancer 1989632247225210.1002/1097-0142(19890601)63:11<2247::AID-CNCR 2820631132>3.0.CO;2-D 2720574 · doi ↗ · pubmed ↗
- 3Mello F.W. Miguel A.F.P. Dutra K.L. Porporatti A.L. Warnakulasuriya S. Guerra E.N.S. Rivero E.R.C. Prevalence of oral potentially malignant disorders: A systematic review and meta-analysis J. Oral Pathol. Med.20184763364010.1111/jop.1272629738071 · doi ↗ · pubmed ↗
- 4Porter S. Gueiros L.A. Leão J.C. Fedele S. Risk factors and etiopathogenesis of potentially premalignant oral epithelial lesions Oral Surg. Oral Med. Oral Pathol. Oral Radiol.201812560361110.1016/j.oooo.2018.03.00829891084 · doi ↗ · pubmed ↗
- 5Hernandez B.Y. Zhu X. Goodman M.T. Gatewood R. Mendiola P. Quinata K. Paulino Y.C. Wilson B.A. Betel nut chewing, oral premalignant lesions, and the oral microbiome P Lo S ONE 201712 e 017219610.1371/journal.pone.017219628225785 PMC 5321455 · doi ↗ · pubmed ↗
- 6Mougeot J.C. Beckman M.F. Langdon H.C. Lalla R.V. Brennan M.T. Bahrani Mougeot F.K. Haemophilus pittmaniae and Leptotrichia spp. Constitute a Multi-Marker Signature in a Cohort of Human Papillomavirus-Positive Head and Neck Cancer Patients Front. Microbiol.20221279454610.3389/fmicb.2021.79454635116012 PMC 8803733 · doi ↗ · pubmed ↗
- 7Singhvi H.R. Malik A. Chaturvedi P. The Role of Chronic Mucosal Trauma in Oral Cancer: A Review of Literature Indian. J. Med. Paediatr. Oncol.201738445010.4103/0971-5851.20351028469336 PMC 5398106 · doi ↗ · pubmed ↗
- 8Tan Y. Wang Z. Xu M. Li B. Huang Z. Qin S. Nice E.C. Tang J. Huang C. Oral squamous cell carcinomas: State of the field and emerging directions Int. J. Oral Sci.2023154410.1038/s 41368-023-00249-w 37736748 PMC 10517027 · doi ↗ · pubmed ↗
