# Regulus infers signed regulatory relations from few samples’ information using discretization and likelihood constraints

**Authors:** Marine Louarn, Guillaume Collet, Ève Barré, Thierry Fest, Olivier Dameron, Anne Siegel, Fabrice Chatonnet, Ilya Ioshikhes, Ilya Ioshikhes, Ilya Ioshikhes, Ilya Ioshikhes

PMC · DOI: 10.1371/journal.pcbi.1011816 · 2024-01-22

## TL;DR

Regulus is a new method that identifies how transcription factors regulate genes using limited samples and biological knowledge.

## Contribution

Regulus integrates TF binding, gene expression, and region accessibility data with biological constraints to infer signed regulatory relations from few samples.

## Key findings

- Regulus identifies both known and new regulators consistent with gene expression and region accessibility data.
- The method includes low-expressed genes in regulatory relations and reduces the space of putative TF-gene relations.
- It applies likelihood constraints to qualify regulatory relations as activation or inhibition.

## Abstract

Transcriptional regulation is performed by transcription factors (TF) binding to DNA in context-dependent regulatory regions and determines the activation or inhibition of gene expression. Current methods of transcriptional regulatory circuits inference, based on one or all of TF, regions and genes activity measurements require a large number of samples for ranking the candidate TF-gene regulation relations and rarely predict whether they are activations or inhibitions.

We hypothesize that transcriptional regulatory circuits can be inferred from fewer samples by (1) fully integrating information on TF binding, gene expression and regulatory regions accessibility, (2) reducing data complexity and (3) using biology-based likelihood constraints to determine the global consistency between a candidate TF-gene relation and patterns of genes expressions and region activations, as well as qualify regulations as activations or inhibitions.

We introduce Regulus, a method which computes TF-gene relations from gene expressions, regulatory region activities and TF binding sites data, together with the genomic locations of all entities. After aggregating gene expressions and region activities into patterns, data are integrated into a RDF (Resource Description Framework) endpoint. A dedicated SPARQL (SPARQL Protocol and RDF Query Language) query retrieves all potential relations between expressed TF and genes involving active regulatory regions. These TF-region-gene relations are then filtered using biological likelihood constraints allowing to qualify them as activation or inhibition. Regulus provides signed relations consistent with public databases and, when applied to biological data, identifies both known and potential new regulators. Regulus is devoted to context-specific transcriptional circuits inference in human settings where samples are scarce and cell populations are closely related, using discretization into patterns and likelihood reasoning to decipher the most robust regulatory relations.

Gene expression regulation is based on the activity of specialized regulatory proteins called transcription factors (TFs) which can bind DNA at specific sequences. Understanding the regulatory relations between TFs and genes in humans is fundamental in personalized clinical settings, to better decipher the pathological mechanisms and to identify new therapeutic solutions. However, finding the main regulators of such systems is usually difficult, due to the scarcity of available samples and the biological closeness of the studied cell types. To overcome these issues, we introduce a new tool called Regulus. We use information from genes and TFs expression, regulatory regions activity and TF binding sites occurrences to compute TF-gene relations. We then apply a likelihood reasoning step, based on the biological knowledge of transcriptional regulation mechanisms, to select the most probable relations and assign them a function as activation or inhibition. Finally, we reduce the potential TFs list by a specificity / coverage filter and we annotate it according to existing literature. By testing Regulus on large-scale biological datasets, each describing four biological contexts, we show that this tool is able to i) identify both known and undescribed regulators consistent with all the gene expression and region accessibility constraints in each biological context, ii) include low expressed genes in its relations and iii) considerably limit the space of putative TF-gene relations.

## Linked entities

- **Proteins:** TF (transferrin)
- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Genes:** FOXJ3 (forkhead box J3) [NCBI Gene 22887], IRF8 (interferon regulatory factor 8) [NCBI Gene 3394] {aka H-ICSBP, ICSBP, ICSBP1, IMD32A, IMD32B, IRF-8}, PAX5 (paired box 5) [NCBI Gene 5079] {aka ALL3, BSAP, PAX-5}, MYC (MYC proto-oncogene, bHLH transcription factor) [NCBI Gene 4609] {aka MRTL, MYCC, bHLHe39, c-Myc}, PRDM1 (PR/SET domain 1) [NCBI Gene 639] {aka BLIMP-1, BLIMP1, PRDI-BF1}, BACH2 (BACH transcriptional regulator 2) [NCBI Gene 60468] {aka BTBD25, IMD60}, CD274 (CD274 molecule) [NCBI Gene 29126] {aka ADMIO5, B7-H, B7H1, PD-L1, PDCD1L1, PDCD1LG1}, IRF4 (interferon regulatory factor 4) [NCBI Gene 3662] {aka IMD131, LSIRF, MUM1, NF-EM5, SHEP8}, CYP1A1 (cytochrome P450 family 1 subfamily A member 1) [NCBI Gene 1543] {aka AHH, CP11, CYP1, CYPIA1, P1-450, P450-C}, YY1 (YY1 transcription factor) [NCBI Gene 7528] {aka DELTA, GADEVS, INO80S, NF-E1, UCRBP, YIN-YANG-1}, ZNF219 (zinc finger protein 219) [NCBI Gene 51222] {aka ZFP219}, F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, TGIF1 (TGFB induced factor homeobox 1) [NCBI Gene 7050] {aka HPE4, TGIF}, AHR (aryl hydrocarbon receptor) [NCBI Gene 196] {aka FVH3, RP85, bHLHe76}, STAT3 (signal transducer and activator of transcription 3) [NCBI Gene 6774] {aka ADMIO, ADMIO1, APRF, HIES}, XBP1 (X-box binding protein 1) [NCBI Gene 7494] {aka TREB-5, TREB5, XBP-1, XBP2}, ZNF75A (zinc finger protein 75A) [NCBI Gene 7627], LGR5 (leucine rich repeat containing G protein-coupled receptor 5) [NCBI Gene 8549] {aka FEX, GPR49, GPR67, GRP49, HG38}, BCL6 (BCL6 transcription repressor) [NCBI Gene 604] {aka BCL5, BCL6A, LAZ3, ZBTB27, ZNF51}, KLF16 (KLF transcription factor 16) [NCBI Gene 83855] {aka BTEB4, DRRF, NSLP2}, LEF1 (lymphoid enhancer binding factor 1) [NCBI Gene 51176] {aka ECTD1, ECTD17, LEF-1, TCF10, TCF1ALPHA, TCF7L3}, SP4 (Sp4 transcription factor) [NCBI Gene 6671] {aka HF1B, SPR-1}, TFAP4 (transcription factor AP-4) [NCBI Gene 7023] {aka AP-4, bHLHc41}
- **Diseases:** immunodeficiencies (MESH:D007153), cancer (MESH:D009369), hematological malignancies (MESH:D019337), MBC (MESH:D015448), autoimmune diseases (MESH:D001327)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10833539/full.md

---
Source: https://tomesphere.com/paper/PMC10833539