# Mixing features of transcription factors and genes enable accurate prediction of gene regulation relationships for unknown transcription factors

**Authors:** Risa Okubo, Takashi Morikura, Yusuke Hiki, Yuta Tokuoka, Tetsuya J Kobayashi, Takahiro G Yamada, Akira Funahashi

PMC · DOI: 10.1093/nargab/lqag022 · NAR Genomics and Bioinformatics · 2026-02-25

## TL;DR

A new deep learning model predicts gene regulation by unknown transcription factors by combining amino acid and nucleotide sequence features.

## Contribution

The novel model GReNIMJA can predict gene regulation for unknown transcription factors using mixed sequence features.

## Key findings

- The model achieved 84.4% accuracy for known TFs and 68.5% for unknown TFs.
- It outperforms conventional models in predicting regulatory relationships for unknown TFs.

## Abstract

Identifying regulatory relationships between transcription factors (TFs) and genes is essential to understand diverse biological phenomena related to gene expression. Recently, deep learning–based models to predict TFs that bind to genes from nucleotide sequences of the target genes have been developed, yet these models are trained to predict known TFs only. Here, we developed a deep learning model, GReNIMJA (Gene Regulatory Network Inference by Mixing and Jointing features of Amino acid and nucleotide sequences), to predict gene regulation even by unknown TFs. Our model is designed to mix the features of the TF amino acid sequences and nucleotide sequences of the target genes using a 2D Long Short-Term Memory architecture and to perform binary classification with the aim of determining the presence or absence of a regulatory relationship. By explicitly modeling interactions between TFs and genes, our model can predict gene regulation for unknown TFs. The accuracy of our model in predicting regulatory relationships was 84.4% for known TFs (higher than those of conventional models) and 68.5% for unknown TFs; the latter is an unsolved task for conventional deep learning-based models. We expect our model to advance identification of unknown gene regulatory networks and contribute to the understanding of diverse biological phenomena.

Graphical Abstract

## Full-text entities

- **Genes:** F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}
- **Diseases:** LSTM (MESH:D000088562), cancers (MESH:D009369), TF (MESH:D005171)
- **Chemicals:** amino acid (MESH:D000596), amino (-), AT (MESH:D001246)
- **Species:** Homo sapiens (human, species) [taxon 9606], Rattus norvegicus (brown rat, species) [taxon 10116], Canis lupus familiaris (dog, subspecies) [taxon 9615], Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12954442/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12954442/full.md

## References

73 references — full list in the complete paper: https://tomesphere.com/paper/PMC12954442/full.md

---
Source: https://tomesphere.com/paper/PMC12954442