Tracing regulatory element networks using epigenetic traits to identify key transcription factors: TENET R/Bioconductor package
Daniel J Mullen, Zexun Wu, Ethan Nelson-Moore, Huan Cao, Lauren Han, Ite A Offringa, Suhn K Rhie

TL;DR
TENET is a new R package that helps researchers identify key transcription factors and regulatory elements in specific cell types using epigenetic data.
Contribution
The TENET package introduces a novel method to trace regulatory element networks using epigenetic traits across diverse cell types and cancer data.
Findings
TENET integrates histone marks, open chromatin, DNA methylation, and gene expression data to identify cell type-specific TFs and REs.
The package includes methods to analyze findings with motifs, clinical data, and chromatin conformation datasets.
TENET was applied to pan-cancer data, revealing TFs and REs linked to ten cancer types.
Abstract
There is a lack of publicly available bioinformatic tools that can be widely used by researchers to identify transcription factors (TFs) that regulate cell type-specific regulatory elements (REs). To address this, we developed the Tracing regulatory Element Networks using Epigenetic Traits (TENET) R/Bioconductor package. By collecting hundreds of histone mark and open chromatin datasets from a variety of cell lines, primary cells, and tissues, and comparing these features along with matched DNA methylation and gene expression data, TENET identifies TFs and REs linked to a specific cell type. Moreover, we developed methods to interrogate findings using motifs, clinical information, and other genomic and chromatin conformation capture datasets, and applied them to pan-cancer data, highlighting TFs and REs associated with ten different cancer types. TENET enables researchers to better…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1- —USC Keck School of Medicine
- —USC Center for Genetic Epidemiology
- —USC Norris Comprehensive Cancer Center10.13039/100017192
- —John H. Richardson Endowed Postdoctoral Fellowship in Oncology Research
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Epigenetics and DNA Methylation · RNA modifications and cancer
1 Introduction
Transcription factors (TFs) bind to regulatory elements (REs) to control the expression of numerous genes across the genome. REs include promoters, which are located proximally to the transcription start sites (TSSs) of the genes they regulate, and enhancers, which are located at a distance from the TSSs of their target genes. The locations of REs in a given cell type can be ascertained by profiling histone marks, including histone 3 lysine 4 trimethylation (H3K4me3, a mark of active promoters), histone 3 lysine 4 monomethylation (H3K4me1, a mark of poised and active enhancers), and histone 3 lysine 27 acetylation (H3K27ac, a mark of active enhancers) using ChIP-seq and its derivative methods such as CUT&RUN and CUT&Tag. DNase-seq and ATAC-seq can identify nucleosome-depleted regions (NDRs), which are open chromatin regions where TFs bind within REs. However, assessing the activities of REs in some cell and tissue types using these techniques remains challenging, as they necessitate the use of a substantial number of cells, are time-consuming to perform, and may be hindered by difficulties in acquiring fresh tissues and intact nuclei (Lee and Rhie 2021).
DNA methylation is one of the most thoroughly studied forms of epigenetic modification. DNA methylation levels near REs correlate inversely with the activities of those elements (Stadler et al. 2011, Aran et al. 2013, Blattler and Farnham 2013), indicating that DNA methylation can be utilized to annotate specific REs where TFs bind. DNA methylation is easily assayed from any type of cell and tissue, including formalin-fixed, paraffin-embedded (FFPE) tissue samples, using very few cells (Hinoue et al. 2012, Farlik et al. 2015).
Several studies have utilized DNA methylation and RNA-seq data to identify REs and their target genes in different cell types (Yao et al. 2015, Heyn et al. 2016, Rhie et al. 2016, Fleischer et al. 2017, Silva et al. 2019, Detilleux et al. 2022). However, a deficiency exists in publicly available and easily accessible bioinformatic tools for identifying TFs that regulate REs from DNA methylation and RNA-seq data by incorporating histone mark and open chromatin datasets. To address this gap, we upgraded previous TENET frameworks (Rhie et al. 2016, Mullen et al. 2020) and developed an R/Bioconductor package, which incorporates DNA methylation datasets, as well as datasets of histone marks and open chromatin, to identify and assess the activities of REs (promoters and enhancers). The TENET R/Bioconductor package also includes algorithms to allow users to easily combine epigenomic datasets with RNA-seq datasets to identify important TFs linked to RE dysregulation, which drive individual subgroups of cases compared to controls. Furthermore, it has numerous functions to interrogate findings with TF motif databases, patient survival information, and other genomic and chromatin conformation capture datasets. To demonstrate the effectiveness of the TENET R/Bioconductor package, we applied it to perform a pan-cancer analysis on datasets from ten distinct cancer types. These insights provide valuable information for understanding gene regulation in various cell types, as well as identifying potential biomarkers and therapeutic targets for clinical intervention.
2 Feature highlights
2.1 TENET design and features
The rationale behind TENET is that overexpression of a TF gene in a particular cell type (case) compared to another type (control) can lead to increased binding of the translated TF protein to numerous cell type-specific REs, resulting in widespread changes in the expression of downstream target genes controlled by these REs. Thus, the activities of TFs have important effects on the transcriptome of a cell type, determining cell fate (Fig. 1, available as supplementary data at Bioinformatics online). Unlike previous TENET frameworks, the TENET R/Bioconductor package (Fig. 1A) has been newly updated to utilize GenomicRanges and MultiAssayExperiment objects, allowing users to utilize a much wider range of epigenomic data as well as matched DNA methylation and gene expression data. In addition to the main TENET package, we developed the TENET.AnnotationHub package, which contains datasets created by processing hundreds of epigenomic datasets, such as ChIP-seq and open chromatin datasets across cell and tissue types and ten cancer types to allow users to perform analyses without needing their own such datasets representing REs (Fig. 1B, Fig. 2, available as supplementary data at Bioinformatics online) (Thurman et al. 2012, Andersson et al. 2014, Forrest et al. 2014, Kundaje et al. 2015, Corces et al. 2018, Moore et al. 2020). We also created the TENET.ExperimentHub package with example datasets for easy use in the TENET R/Bioconductor package. We have added functionality to analyze REs at both promoters and enhancers and have included new algorithms to automatically set methylation cutoffs to identify dysregulated REs (Fig. 3, available as supplementary data at Bioinformatics online) in TENET. The package also allows users to interrogate findings with motif searching, perform survival analyses using both the gene expression of key TF genes identified by TENET (Fig. 4, available as supplementary data at Bioinformatics online) and the DNA methylation levels of the identified RE sites, generate heatmaps contrasting the expression of the identified TF genes with the DNA methylation of their linked RE sites, and integrate other genomic and chromatin conformation capture datasets.
Overview of the TENET R/Bioconductor package. (A) TENET includes seven steps. In step 1, histone mark and open chromatin data that annotate regulatory elements (REs) and nucleosome-depleted regions (NDRs) are used to identify DNA methylation sites which are located in REs. In step 2, using DNA methylation data, RE DNA methylation sites are classified into four different categories based on their methylation levels in case and control samples: unmethylated in both, hypomethylated in case compared to control, hypermethylated in case compared to control, and methylated in both. In step 3, using matched gene expression data, TENET computes Z-scores linking RE DNA methylation sites with gene expression to identify the transcription factor (TF) gene-RE site links genome-wide which show significant differences in DNA methylation of the RE sites and gene expression in a subset of the case samples compared to the control samples. In step 4, statistically significant TF gene–RE site links are identified for each RE DNA methylation site by ranking and performing multiple testing correction on the total number of links. In step 5, by performing additional statistical tests (e.g. Wilcoxon rank-sum test, adjusted P-value < .05), TENET optimizes the identification of links. In step 6, by calculating the number of linked RE DNA methylation sites per TF, key TFs linked to numerous RE sites are identified. Finally, step 7 functions perform downstream analyses by integrating multi-omic datasets to better characterize the identified TFs, REs, and their target genes. (B) TENET includes built-in epigenomic datasets to aid the user in identifying DNA methylation sites located in REs. These datasets provide significant coverage of both DNA methylation probes included in Illumina Human Methylation arrays (HM450, EPIC v1, EPIC v2), as well as all CpG sites throughout the genome. (C) Bar plots display the key TFs identified through a pan-cancer TENET analysis using TCGA data. The top 5 TFs linked to the largest number of hypomethylated enhancer sites (HM450 probes) are shown for each cancer type.
The TENET R/Bioconductor package currently consists of a collection of functions divided into seven steps that are designed to be run in succession, although a subset of these functions can be used independently if desired (Fig. 1A). The easyTENET wrapper function has also been included, which runs steps 1 through 6 as a single function with simplified options for ease of use.
2.2 Regulatory elements and transcription factors linked to 10 cancer types are identified by using TENET
To illustrate the efficacy of using the TENET R/Bioconductor package, we applied it to a panel of Illumina HumanMethylation450 (HM450) DNA methylation and gene expression datasets consisting of both tumor and adjacent-normal samples from ten cancer types, downloaded from The Cancer Genome Atlas (TCGA) (Colaprico et al. 2016). Using the integrated epigenomic datasets built into TENET, we identified over 180 000 HM450 DNA methylation probes in promoters and over 90 000 probes in enhancers across the ten cancer types, including bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), kidney renal papillary cell carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), and thyroid carcinoma (THCA). Of these, the majority of enhancer probes in each cancer type were hypomethylated in tumor (case) compared to normal (control) samples (Fig. 5, available as supplementary data at Bioinformatics online), while the majority of promoter probes were hypermethylated.
Next, we identified key TFs whose overexpression is associated with the activation of numerous cell type-specific REs (hypomethylated RE sites) for the ten cancer types (Fig. 1C, Fig. 6, available as supplementary data at Bioinformatics online). We found both well-known TFs, reported to be activated in tumors and involved in RE networks, as well as novel TFs. For example, TP63 was one of the key TFs identified to be linked to enhancers in ESCA and BLCA. TP63 has been previously shown to promote the growth of ESCA cells by regulating the cell cycle and the Akt pathway (Ye et al. 2014). TP63 has also been previously associated with the basal subtype of BLCA (Iyyanki et al. 2021). FOXA1, known as a pioneering TF (Cirillo et al. 2002, Iwafuchi-Doi et al. 2016, Mayran and Drouin 2018), was identified as the top TF in both BLCA and BRCA. FOXA1 plays a key role in cancer by regulating the nuclear steroid receptors to control the transcriptomes of subtypes of BRCA and BLCA, as highlighted by previous studies (DeGraff et al. 2012, Bernardo et al. 2013, Rhie et al. 2016, Warrick et al. 2016, Fu et al. 2019, Sikic et al. 2020, Iyyanki et al. 2021, Seachrist et al. 2021). We also identified novel TFs such as NOBOX in HNSC, ZNF280C in KIRP, and POU6F2 in LUSC, which potentially regulate cancer-specific regulatory networks yet remain relatively understudied in those cancer types (Fig. 1C). To further illustrate TENET’s features, we examined the association between the expression of the identified TFs and DNA methylation of their linked RE sites with patient survival, identifying those that are statistically significantly associated with KIRP patient survival (Fig. 7A and B, available as supplementary data at Bioinformatics online). Lastly, we identified the location of TF motifs at REs and the potential target genes of REs by integrating ChIP-seq and chromatin conformation capture (e.g. Hi-C) datasets in KIRP and LIHC, respectively (Fig. 7C and D, available as supplementary data at Bioinformatics online).
3 Conclusions and future directions
The TENET R/Bioconductor package enables the identification of key TFs and REs in the cell type of interest by combining multi-omic datasets. This R/Bioconductor package is a substantially upgraded version of previous TENET frameworks (Rhie et al. 2016, Mullen et al. 2020), which includes new features and datasets. In this study, we demonstrated the use of TENET to detect cell type-specific TFs and REs that are dysregulated in ten cancer types, highlighting those linked to numerous cancer-specific epigenomic changes. Identified TFs and REs, including ones associated with patient survival, will accelerate the further development of biomarkers and therapeutic strategies. While we used cancer datasets to showcase our method, TENET can utilize DNA methylation and gene expression datasets from any cell or disease group to identify key TFs and REs. All of TENET’s functions, including those for searching TF motifs, using topologically associating domains (TADs) to further characterize the target genes of REs and TFs, and identifying activated or inactivated TFs and REs, serve as valuable resource tools. The supplementary TENET.AnnotationHub package also includes datasets useful for identifying REs, which we created by archiving and processing hundreds of epigenomic datasets.
TENET uses DNA methylation to assess the activity of REs, so we cannot evaluate REs that do not have DNA methylation sites nearby. Although we applied TENET to ten cancer datasets generated from bulk tissues, we anticipate that this approach will be applicable to single-cell methyl-seq and single-cell RNA-seq when sufficient data become available. TFs, REs, and links identified using TENET provide invaluable resources, but functional assays, such as performing TF ChIP-seq and its derivative methods, are still required to validate the predictions, since these links were identified through the evaluation of statistical associations and thus, may include indirectly associated TFs.
Supplementary Material
btaf435_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Andersson R , Gebhard C, Miguel-Escalada I et al An atlas of active enhancers across human cell types and tissues. Nature 2014;507:455–61.24670763 10.1038/nature 12787 PMC 5215096 · doi ↗ · pubmed ↗
- 2Aran D , Sabato S, Hellman A. DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol 2013;14:R 21. 10.1186/gb-2013-14-3-r 2123497655 PMC 4053839 · doi ↗ · pubmed ↗
- 3Bernardo GM , Bebek G, Ginther CL et al FOXA 1 represses the molecular phenotype of basal breast cancer cells. Oncogene 2013;32:554–63.22391567 10.1038/onc.2012.62PMC 3371315 · doi ↗ · pubmed ↗
- 4Blattler A , Farnham PJ. Cross-talk between site-specific transcription factors and DNA methylation states. J Biol Chem 2013;288:34287–94.24151070 10.1074/jbc.R 113.512517 PMC 3843044 · doi ↗ · pubmed ↗
- 5Cirillo LA , Lin FR, Cuesta I et al Opening of compacted chromatin by early developmental transcription factors HNF 3 (Fox A) and GATA-4. Mol Cell 2002;9:279–89.11864602 10.1016/s 1097-2765(02)00459-8 · doi ↗ · pubmed ↗
- 6Colaprico A , Silva TC, Olsen C et al TCG Abiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res 2016;44:e 71.26704973 10.1093/nar/gkv 1507 PMC 4856967 · doi ↗ · pubmed ↗
- 7Corces MR , Granja JM, Shams S et al; Cancer Genome Atlas Analysis Network. The chromatin accessibility landscape of primary human cancers. Science 2018;362:eaav 1898.30361341 10.1126/science.aav 1898 PMC 6408149 · doi ↗ · pubmed ↗
- 8De Graff DJ , Clark PE, Cates JM et al Loss of the urothelial differentiation marker FOXA 1 is associated with high grade, late stage bladder cancer and increased tumor proliferation. P Lo S One 2012;7:e 36669.22590586 10.1371/journal.pone.0036669 PMC 3349679 · doi ↗ · pubmed ↗
