# TRAFICA: an open chromatin language model to improve transcription factor binding affinity prediction

**Authors:** Yu Xu, Chonghao Wang, Ke Xu, Yi Ding, Aiping Lyu, Lu Zhang

PMC · DOI: 10.1093/bioinformatics/btaf469 · 2025-08-23

## TL;DR

TRAFICA is a new model that improves predictions of how transcription factors bind to DNA by considering open chromatin regions.

## Contribution

TRAFICA integrates open chromatin data with in vitro binding profiles to enhance TF–DNA binding affinity prediction.

## Key findings

- TRAFICA outperforms existing tools in predicting TF–DNA binding affinity.
- Incorporating open chromatin regions improves prediction accuracy.
- TRAFICA achieves state-of-the-art performance in both in vitro and in vivo settings.

## Abstract

In silico transcription factor and DNA (TF–DNA) binding affinity prediction plays a vital role in examining TF binding preferences and understanding gene regulation. The existing tools employ TF–DNA binding profiles from in vitro high-throughput technologies to predict TF–DNA binding affinity. However, TFs tend to bind to sequences in open chromatin regions in vivo, such TF binding preference is seldomly considered by these existing tools.

In this study, we developed TRAFICA, an open chromatin language model to predict TF–DNA binding affinity by integrating sequence characteristics of open chromatin regions from ATAC-seq experiments and in vitro TF–DNA binding profiles from high-throughput technologies. We pretrained TRAFICA on over 2.8 million nucleotide sequences in open chromatin regions derived from 197 ATAC-seq experiments (115 cell lines) to learn in vivo TF binding preferences. We further fine-tuned TRAFICA using low-rank adaptation (LoRA) on PBM and HT-SELEX TF-DNA binding profiles to learn intrinsic binding preferences for specific TFs. We systematically evaluated TRAFICA and compared its predictive performance with existing prediction tools and advanced DNA language models. The experimental results demonstrated that TRAFICA significantly outperformed the others in predicting in vitro and in vivo TF–DNA binding affinity, achieving state-of-the-art performance. These findings indicate that considering the sequence characteristics from open chromatin regions could significantly improve TF–DNA binding affinity prediction.

The source code of TRAFICA and detailed tutorials are available at https://github.com/ericcombiolab/TRAFICA.

## Full-text entities

- **Genes:** F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12582366/full.md

---
Source: https://tomesphere.com/paper/PMC12582366