# MetArea: a software package for analysis of the mutually exclusive occurrence in pairs of motifs of transcription factor binding sites based on ChIP-seq data

**Authors:** V.G. Levitsky, A.V. Tsukanov, T.I. Merkulova

PMC · DOI: 10.18699/vjgb-24-90 · Vavilov Journal of Genetics and Breeding · 2024-12-01

## TL;DR

MetArea is a software tool that analyzes ChIP-seq data to identify pairs of transcription factor binding site motifs that occur mutually exclusively in genomic peaks.

## Contribution

MetArea introduces a method to detect mutually exclusive motif pairs and assess their combined recognition performance in ChIP-seq data.

## Key findings

- MetArea identifies pairs of motifs with mutually exclusive occurrences in peaks.
- The software evaluates the recognition performance of individual and joint motifs using pAUPRC.
- The goal is to find motif pairs where the joint motif performs better than individual motifs.

## Abstract

ChIP-seq technology, which is based on chromatin immunoprecipitation (ChIP), allows mapping a set of genomic loci (peaks) containing binding sites (BS) for the investigated (target) transcription factor (TF). A TF may recognize several structurally different BS motifs. The multiprotein complex mapped in a ChIP-seq experiment includes target and other “partner” TFs linked by protein-protein interactions. Not all these TFs bind to DNA directly. Therefore, both target and partner TFs recognize enriched BS motifs in peaks. A de novo search approach is used to search for enriched TF BS motifs in ChIP-seq data. For a pair of enriched BS motifs of TFs, the co-occurrence or mutually exclusive occurrence can be detected from a set of peaks: the co-occurrence reflects a more frequent occurrence of two motifs in the same peaks, while the mutually exclusive means their more frequent detection in different peaks. We propose the MetArea software package to identify pairs of TF BS motifs with the mutually exclusive occurrence in ChIP-seq data. MetArea was designed to predict the structural diversity of BS motifs of the same TFs, and the functional relation of BS motifs of different TFs. The functional relation of the motifs of the two distinct TFs presumes that they are interchangeable as part of a multiprotein complex that uses the BS of these TFs to bind directly to DNA in different peaks. MetArea calculates the estimates of recognition performance pAUPRC (partial area under the Precision–Recall curve) for each of the two input single motifs, identifies the “joint” motif, and computes the performance for it too. The goal of the analysis is to find pairs of single motifs A and B for which the accuracy of the joint A&B motif is higher than those of both single motifs.

## Full-text entities

- **Genes:** F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, FOXA2 (forkhead box A2) [NCBI Gene 3170] {aka HNF-3-beta, HNF3B, TCF3B}, PTEN (phosphatase and tensin homolog) [NCBI Gene 5728] {aka 10q23del, BZS, CWS1, DEC, GLM2, MHAM}, BHLHE40 (basic helix-loop-helix family member e40) [NCBI Gene 8553] {aka BHLHB2, Clast5, DEC1, HLHB2, SHARP-2, SHARP2}, E2F1 (E2F transcription factor 1) [NCBI Gene 1869] {aka E2F-1, RBAP1, RBBP3, RBP3}, AR (androgen receptor) [NCBI Gene 367] {aka AIS, AR8, DHTR, HPCX3, HUMARA, HYSP1}, Fdxr (ferredoxin reductase) [NCBI Gene 14149] {aka AR}, FOXA1 (forkhead box A1) [NCBI Gene 3169] {aka HNF3A, TCF3A}, EGR2 (early growth response 2) [NCBI Gene 1959] {aka AT591, CMT1D, CMT4E, KROX20}, BHLHA15 (basic helix-loop-helix family member a15) [NCBI Gene 168620] {aka BHLHB8, MIST1}, EGR1 (early growth response 1) [NCBI Gene 1958] {aka AT225, G0S30, KROX-24, NGFI-A, TIS8, ZIF-268}, Foxa1 (forkhead box A1) [NCBI Gene 15375] {aka Hnf-3a, Hnf3a, Tcf-3a, Tcf3a}, NFASC (neurofascin) [NCBI Gene 23114] {aka NEDCPMD, NF, NRCAML}, E2f4 (E2F transcription factor 4) [NCBI Gene 104394] {aka 2010111M04Rik}, Ctcf (CCCTC-binding factor) [NCBI Gene 13018], Bhlha15 (basic helix-loop-helix family, member a15) [NCBI Gene 17341] {aka 1810009C13Rik, Bhlhb8, MIST-1, Mist1}, FOXN3 (forkhead box N3) [NCBI Gene 1112] {aka C14orf116, CHES1, PRO1635}, IRF2 (interferon regulatory factor 2) [NCBI Gene 3660] {aka IRF-2}, ETS2 (ETS proto-oncogene 2, transcription factor) [NCBI Gene 2114] {aka ETS2IT1}
- **Diseases:** prostate tumorigenesis (MESH:D011472), prostate cancer (MESH:D011471)
- **Chemicals:** lipopolysaccharide (MESH:D008070), AntiNoise SP (-)
- **Species:** Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11813801/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11813801/full.md

---
Source: https://tomesphere.com/paper/PMC11813801