# TFBSFootprinter: a multiomics tool for prediction of transcription factor binding sites in vertebrate species

**Authors:** Harlan R. Barker, Seppo Parkkila, MarttiE.E. Tolvanen

PMC · DOI: 10.1080/21541264.2025.2521764 · 2025-07-11

## TL;DR

TFBSFootprinter is a new tool that predicts where transcription factors bind in the genomes of many vertebrate species using multiple types of data, improving accuracy compared to existing methods.

## Contribution

TFBSFootprinter introduces a multiomics approach for predicting TFBSs across 317 vertebrate species, offering improved accuracy and usability.

## Key findings

- TFBSFootprinter achieved an AUC of 0.881 using all multiomic data, outperforming existing tools like DeepBind and DeepSEA.
- Combining the best multiomic data further improved performance to an AUC of 0.910.
- The tool is available as Conda and Python packages for easy use.

## Abstract

Transcription factor (TF) proteins play a critical role in the regulation of eukaryotic gene expression via sequence-specific binding to genomic locations known as transcription factor binding sites (TFBSs). Accurate prediction of TFBSs is essential for understanding gene regulation, disease mechanisms, and drug discovery. These studies are therefore relevant not only in humans but also in model organisms and domesticated and wild animals. However, current tools for the automatic analysis of TFBSs in gene promoter regions are limited in their usability across multiple species. To our knowledge, no tools currently exist that allow for automatic analysis of TFBSs in gene promoter regions for many species.

The TFBSFootprinter tool combines multiomic transcription-relevant data for more accurate prediction of functional TFBSs in 317 vertebrate species. In humans, this includes vertebrate sequence conservation (GERP), proximity to transcription start sites (FANTOM5), correlation of expression between target genes and TFs predicted to bind promoters (FANTOM5), overlap with ChIP-Seq TF metaclusters (GTRD), overlap with ATAC-Seq peaks (ENCODE), eQTLs (GTEx), and the observed/expected CpG ratio (Ensembl). In non-human vertebrates, this includes GERP, proximity to transcription start sites, and CpG ratio.

TFBSFootprinter analyses are based on the Ensembl transcript ID for simplicity of use and require minimal setup steps. Benchmarking of the TFBSFootprinter on a manually curated and experimentally verified dataset of TFBSs produced superior results when using all multiomic data (average area under the receiver operating characteristic curve, 0.881), compared with DeepBind (0.798), DeepSEA (0.682), FIMO (0.817) and traditional PWM (0.854). The results were further improved by selecting the best overall combination of multiomic data (0.910). Additionally, we determined combinations of multiomic data that provide the best model of binding for each TF. TFBSFootprinter is available as Conda and Python packages.

## Linked entities

- **Proteins:** SEP2 (K-box region and MADS-box transcription factor family protein)
- **Species:** Homo sapiens (taxon 9606), Mus musculus (taxon 10090)

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12258250/full.md

---
Source: https://tomesphere.com/paper/PMC12258250