# Mapping transcription factor binding sites by learning UV damage fingerprints

**Authors:** Hannah E Wilson, Scott Stevison, Levi Lamprey, John J Wyrick

PMC · DOI: 10.1093/nar/gkaf1014 · 2025-10-14

## TL;DR

This paper introduces a new method to map transcription factor binding sites by analyzing UV-induced DNA damage patterns, improving accuracy and resolution.

## Contribution

The novel approach uses UV damage fingerprints and machine learning to identify transcription factor binding sites with single-nucleotide resolution.

## Key findings

- CPD fingerprints from UV damage can be used to identify TF binding sites with machine learning.
- New binding sites for Hap2/Hap3/Hap5 and Gcr1 were identified in yeast, including sites missed by previous methods.
- The method successfully identified new TFBS in human cells for the Nuclear Factor-Y complex.

## Abstract

Deciphering transcriptional networks requires methods to accurately map binding sites of sequence-specific transcription factors (ssTFs) across the genome. Here, we show that ssTF binding induces distinct patterns of UV-induced cyclobutane pyrimidine dimers (CPDs), and that these CPD ‘fingerprints’ can be exploited by machine learning methods to identify ssTF binding sites (TFBS). As a proof of principle, we analyzed CPD-seq data from yeast cells using the Random Forest algorithm to identify 75 TFBS bound by the Hap2/Hap3/Hap5 ssTF complex, including ∼25 new sites missed by previous chromatin immunoprecipitation (ChIP)-based experiments. Parallel analysis of the Gcr1 ssTF using a neural network trained on CPD-seq data including only 6 known binding sites identified 63 Gcr1 TFBS across the genome. Our analysis indicates that the newly identified TFBS are associated with many genes that function in expected categories (e.g. mitochondrial respiration or glycolysis), and whose mRNA levels are down-regulated in ssTF mutants. Similar analysis of CPD-capture-sequencing data from human cells identified new sites bound by the homologous Nuclear Factor-Y complex. These findings indicate that distinct cellular patterns of UV damage occurring at different classes of TFBS can be recognized by machine learning methods to map these regulatory elements with improved accuracy and single-nucleotide resolution.

Graphical Abstract

## Linked entities

- **Genes:** NFYA (nuclear transcription factor Y subunit alpha) [NCBI Gene 4800], NFYB (nuclear transcription factor Y subunit beta) [NCBI Gene 4801], NFYC (nuclear transcription factor Y subunit gamma) [NCBI Gene 4802], GCR1 (G-protein-coupled receptor 1) [NCBI Gene 841247], NF-YB7 (nuclear factor Y, subunit B7) [NCBI Gene 815843]
- **Chemicals:** UV (PubChem CID 155487962), CPDs (PubChem CID 35370)
- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Genes:** CPD (carboxypeptidase D) [NCBI Gene 1362] {aka GP180}, NFYC (nuclear transcription factor Y subunit gamma) [NCBI Gene 4802] {aka CBF-C, CBFC, H1TF2A, HAP5, HSM, NF-YC}, NFYB (nuclear transcription factor Y subunit beta) [NCBI Gene 4801] {aka CBF-A, CBF-B, HAP3, NF-YB}, HAP1 (huntingtin associated protein 1) [NCBI Gene 9001] {aka HAP2, HIP5, HLP, hHLP1}
- **Chemicals:** ssTF (-), CPDs (MESH:D011740)
- **Species:** Homo sapiens (human, species) [taxon 9606], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12526043/full.md

---
Source: https://tomesphere.com/paper/PMC12526043