# A systematic guide for identifying transcription factors that directly regulate the expression of a gene of interest

**Authors:** Andrew D. Bates, Dawid Grzela, Maciej Studzian, Louise Brennan, Moli Williams, Conor Fawcett, Beth Hammond, Manreen Grewal, Marcin Ratajewski, Lukasz Pulaski, Urszula L. McClurg

PMC · DOI: 10.1101/gr.281154.125 · Genome Research · 2026-03-01

## TL;DR

This paper reviews methods to identify which transcription factors directly control a specific gene's expression.

## Contribution

The paper introduces a conceptual matrix to evaluate and guide gene-specific TF identification methods.

## Key findings

- Current methods for identifying gene-specific TFs face challenges due to complex regulatory environments.
- Perturbation strategies can help establish causal relationships in gene regulation.
- A framework is proposed to balance biological relevance and experimental sensitivity.

## Abstract

Transcriptional regulation lies at the heart of cellular identity and function, hinging on the precise binding of transcription factors (TFs) and cofactors to gene regulatory elements such as promoters and enhancers. Although it is relatively routine to profile genome-wide DNA binding landscapes of proteins, identifying the specific proteins that bind to, and regulate the transcription of, a particular gene of interest (GOI) remains a persistent experimental and conceptual challenge. This gene-centric question is complicated by the multilayered regulatory environment in which each gene resides, comprising 3D chromatin structure, enhancer–promoter looping, DNA accessibility, histone modifications, and cell state–dependent protein dynamics. In this review, we dissect the strengths, limitations, and biological relevance of current approaches for studying direct protein–DNA interactions, distinguishing between protein-centric and DNA-centric methodologies. We introduce a conceptual matrix of biological relevance, integrating the origin of DNA and protein elements (cis and trans) to evaluate false-positive and false-negative risks across experimental systems. Moreover, we explore how perturbation strategies—gain and loss of function—can complement steady-state profiling to establish causality in gene regulation. By critically examining both established tools and emerging techniques such as genome editing, synthetic chromosomes, and high-resolution imaging, we provide a practical framework for investigators seeking to uncover direct regulators of specific genes. Our goal is to guide the design of experiments that balance biological relevance, sensitivity, and interpretability to ultimately answer a deceptively simple question: What TFs directly regulate the expression of my GOI?

## Linked entities

- **Proteins:** tf.S (transferrin S homeolog)

## Full-text entities

- **Genes:** H2AZ1 (H2A.Z variant histone 1) [NCBI Gene 3015] {aka H2A.Z-1, H2A.z, H2A/z, H2AFZ, H2AZ}, RELA (RELA proto-oncogene, NF-kB subunit) [NCBI Gene 5970] {aka AIF3BL3, CMCU, NFKB3, p65}, REG1A (regenerating family member 1 alpha) [NCBI Gene 5967] {aka ICRF, P19, PSP, PSPS, PSPS1, PTP}, F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, CFTR (CF transmembrane conductance regulator) [NCBI Gene 1080] {aka ABC35, ABCC7, CF, CFTR/MRP, MRP7, TNR-CFTR}, APEX2 (apurinic/apyrimidinic endodeoxyribonuclease 2) [NCBI Gene 27301] {aka APE2, APEXL2, XTH2, ZGRF2}, GBA3 (glucosylceramidase beta 3 (gene/pseudogene)) [NCBI Gene 57733] {aka CBG, CBGL1, GLUC, KLRP}, HSF1 (heat shock transcription factor 1) [NCBI Gene 3297] {aka HSTF1}
- **Diseases:** BLI (MESH:C564543), GOI (MESH:C537680), toxicity (MESH:D064420), hematopoietic diseases (MESH:D019337), hypoxia (MESH:D000860), phototoxicity (MESH:D017484), DamID (MESH:C538228), cancer (MESH:D009369)
- **Chemicals:** calcium (MESH:D002118), auxin (MESH:D007210), formaldehyde (MESH:D005557), luciferin (MESH:D000090562), agarose (MESH:D012685), amino acids (MESH:D000596), ribonucleotide (MESH:D012265), desthiobiotin (MESH:C004749), Ca2+ (-), hydrogen peroxide (MESH:D006861), actinomycin D (MESH:D003609), SDS (MESH:D012967), biotin (MESH:D001710), bromouridine (MESH:C006824), cytosine (MESH:D003596), polyacrylamide (MESH:C016679), metal (MESH:D008670), oxygen (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Mus musculus (house mouse, species) [taxon 10090], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Cypridina (genus) [taxon 261517], Escherichia coli (E. coli, species) [taxon 562]
- **Cell lines:** YAC — Mus musculus (Mouse), Mouse lymphoma, Cancer cell line (CVCL_2244), HeLas — Homo sapiens (Human), Human papillomavirus-related endocervical adenocarcinoma, Cancer cell line (CVCL_0058), CUT&amp;Tag — Mus musculus (Mouse), Transformed cell line (CVCL_6363)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12951958/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12951958/full.md

## References

281 references — full list in the complete paper: https://tomesphere.com/paper/PMC12951958/full.md

---
Source: https://tomesphere.com/paper/PMC12951958