Identification of the New GmJAG1 Transcription Factor Binding Motifs Using DAP-Seq
Jinxing Wang, Zigang Pu, Weiyao Zhang, Mengnan Qu, Lusi Gao, Wenjing Pan, Yanan Sun, Chunxu Fu, Ling Zhang, Mingkun Huang, Yufang Hu

TL;DR
This study identifies new DNA binding motifs for the GmJAG1 transcription factor in soybean, which helps regulate leaf shape, seed number, and flower size.
Contribution
The study discovers and experimentally validates two novel GmJAG1 binding motifs using DAP-seq and EMSA.
Findings
Two new GmJAG1 binding motifs were identified using DAP-seq.
EMSA experiments confirmed GmJAG1's binding to these motifs.
Potential regulatory network mediated by GmJAG1 was constructed.
Abstract
Interaction between transcription factors (TFs) and motifs is essential for gene regulation and the subsequent phenotype formation. Soybean (Glycine max) JAGGEED 1 (GmJAG1) is a key TF that controls leaf shape, seed number and flower size. To understand the GmJAG1 binding motifs, in this study, we performed the GmJAG1 DNA affinity purification sequencing (DAP-seq) experiment, which is a powerful tool for the de novo motif prediction method. Two new significant GmJAG1 binding motifs were predicted and the EMSA experiments further verified the ability of GmJAG1 bound to these motifs. The potential binding sites in the downstream gene promoter were identified through motif scanning and a potential regulatory network mediated by GmJAG1 was constructed. These results served as important genomic resources for further understanding the regulatory mechanism of GmJAG1.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —Innovation Project of Heilongjiang Academy of Agricultural Sciences
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlant Molecular Biology Research · Soybean genetics and cultivation · Genomics and Chromatin Dynamics
1. Introduction
Recognition of specific conserved cis-regulatory elements (e.g., motifs) in promoters or enhancers by specific transcription factors (TFs) is essential for gene regulation, which subsequently affects the organism phenotype [1,2]. Due to the rapid development of next-generation sequencing, high-throughput sequencing data facilitated the discovery of TF motifs [3]. To date, thousands of conserved motifs have been deposited in public databases such as TRANSFAC [4], JASPAR [5], and some plant-specific motif databases like PlantTFDB [6] or PlanPAN [7], etc. Typically, each TF family recognizes a collection of similar conserved DNA sequences, whose binding presence can be represented as the motif position weight matrices (PWMs). However, the current catalog of TF binding motifs remains incomplete, making it difficult to understand the TF regulatory network, especially for a robust TF with a favored phenotype, such as the soybean (Glycine max) JAGGED 1 (GmJAG1), a C2H2 type zinc finger (C2H2-zf) TF [8,9].
GmJAG1 is located at the famous Ln loci controlling multiple developmental process in soybean, such as leaf shape, seed number, flower size, etc. [9,10]. The specific GmJAG1 binding motifs are far from being understood and, recently, we identified a new GmJAG1 motif (named Motif0 [M0] in this study) with core sequence GTTGGA from GmJAG1 chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) data which are largely different from other TF motifs in public databases [11]. The fact that TF may have multi-target motifs has been reported [12]; however, as a key TF in soybean, it is still unknown whether GmJAG1 is able to recognize other conserved motifs. Although the GmJAG1 ChIP-seq data are available, it still has several limitations for new motifs discovered using these ChIP-seq data [13,14]. One of the major drawbacks is that the ChIP-seq was performed in vivo, which means that TF binding to the enriched peak region could occur through other interacting TF partners [13], which is supported by the observation of enrichment of other TF binding motifs such as ERF, bHLH, etc., in ChIP-seq data [11]. Using these “indirect” GmJAG1 binding regions as input for motif prediction would greatly influence the discovery of novel motifs or lead to the discovery of other TF motifs that are not bound by GmJAG1.
By contrast, DNA affinity purification sequencing (DAP-seq) is able to pull down the DNA fragments directly in vitro, avoiding the binding noise from other interacting TF partners [15]. To identify some new GmJAG1 binding motifs, we apply the DAP-seq using in vitro expressed GST-tagged GmJAG1 protein. Finally, we successfully identified two new GmJAG1 motifs (named Motif1 [M1] and Motif2 [M2]) which were further confirmed by the subsequent Electrophoretic Mobility Shift Assay (EMSA) experiments, suggesting a multiple targeted role of GmJAG1 to soybean genome and partially supporting a multiple function of GmJAG1 in the developmental process of soybean.
2. Results
2.1. Summary of GmJAG1 DAP-Seq Data
To confirm whether GmJAG1 is able to bind some new motifs, we performed GmJAG1 DAP-seq experiments, generating DAP and input libraries. After sequencing, we obtained 73 million (M) and 40 M reads for the DAP and input libraries, respectively (Table S1). Both the GmJAG1 DAP-seq and the input control libraries show a high mapping rate (>90%) against the soybean reference genome (Table S1). In total, 2750 GmJAG1 highly enriched regions were identified (we chose an 8-fold enrichment cut-off for peak calling via HOMER v4.11 software). Approximately 100–200 peaks were distributed across each chromosome (Figure 1A, Table S2). The average GC content of these GmJAG1 enriched peaks (48.7%) is significantly higher than the random genomic sequence (34.6%) (p-value < 2.2 × 10^−16^) (Figure 1B), indicating that GmJAG1 prefers a high GC sequence. Furthermore, we observed that 2142, 384, 203 and 159 enriched peaks located in the distal intergenic, genic, promoter and downstream regions, respectively (Figure 1C). However, compared to GmJAG1 ChIP-seq, only 66 (2.4%) DAP-seq enriched peaks were observed that overlapped with ChIP-seq enriched peaks (Figure 1D; Table S2), indicating the large differences in GmJAG1 binding in vivo and in vitro. Nevertheless, these selectively highly enriched peak sequences by DAP-seq served as highly confident DNA input for new motif prediction using computer software (e.g., HOMER).
2.2. Prediction and Validation of Two New GmJAG1 Binding Motifs
As mentioned above, the enriched DNA sequences in DAP-seq were used for motif prediction via HOMER software. We observed that two motifs with the consensus sequences, ACGCCACT and ACTGGCAG, showed high significant enrichment (p value < 1 × 10^−90^) compared to the background genomic sequences. These two motifs were referred as GmJAG1 Motif 1(M1) and Motif 2 (M2), respectively, which showed less similarity to the previous GmJAG1 M0 motif (E-value > 1 × 10^−5^) (Figure 2A; Table S3). We also performed EMSA experiments using GST-GmJAG1 recombinant protein and M1, M2 biotin-labeled probes. As Figure 2B shows, GmJAG1 is able to bind to these motifs, supporting the theory that M1 and M2 are the new GmJAG1 binding motifs.
Next, we scanned these three GmJAG1 binding motifs (M0, M1 and M2) in the gene promoters to predict the potential GmJAG1 target genes. In total, 11,000, 2231 and 3399 gene promoters harbored M0, M1 and M2 were found, respectively, suggesting a potential GmJAG1-mediated regulatory network (Figure 2C,D; Table S4), which also extends our knowledge of GmJAG1 recognition diversity. Since genomic editing at cis-regulatory elements could be an efficient strategy for further crop improvement [16,17,18], the identification of new GmJAG1 binding motifs and its potential bound regions in the genome served as important resources for design of the guided RNA oligos for the CRISPR/Cas9 system.
3. Discussion
As a key transcription factor, understanding the specific binding motifs of GmJAG1 could help to elucidate its regulatory pattern in the soybean developmental process [3,8,9]. In this study, by coupling the DAP-seq method and the EMSA experiments we successfully discovered two new binding motifs for GmJAG1, confirming the multi-targeted binding sites of GmJAG1 in the soybean genome.
Conserved DNA motifs are essential for TF recognition and identification of the specific TF motifs is the key step in interpreting the complicated regulatory network [1,2,19]. However, because the conserved motifs are often short and typically degenerate, the discovery of motifs in the plant genome remains a challenging task. In the model plant Arabidopsis, approximately 1700 TF motif collections have been deposited in the public motifs databases such as PlantPAN and PlantTFDB [6,7], representing about 50–60% of the TFs in Arabidopsis. In contrast, only about 370 out of ~4000 soybean TFs have corresponding motifs in these databases and most of them are predicted from the known motifs of the homologous TFs in other species [6]. In contrast to human or model plants (e.g., Arabidopsis), there are many databases with abundant sequencing data for understanding the function of non-coding regions, such as the human ENCODE (https://www.encodeproject.org accessed on 1 May 2024) and the Arabidopsis Cistrome database [20]. Although soybean also has abundant DNA resequencing data characterizing the single nucleotide polymorph (SNP) or structural variation (SV) at the genomic level, for example, the recently published soybean PAN-genome data [21], few high-throughput sequencing data (e.g., ATAC-seq) [22,23,24] have been aimed at finding these functional non-coding elements (e.g., motifs) in the soybean genome. Compared to the ChIP-seq method [14], a traditional strategy for discovering new motifs, DAP-seq [15] is able to select the DNA fragments directly bound by the recombinant TF in vitro, which showed some advantage over the ChIP-seq in finding motifs. Therefore, DAP-seq may be prioritized in further studies to elucidate soybean TF binding motifs, but it is a long-term undertaking. Since the variation in the non-coding region is associated with soybean domestication [16] and manipulation of these binding motifs via genome editing [17,18] is one of the efficient strategies used to improve agronomic traits, the understanding of GmJAG1 binding motifs would be integrated into large scale DNA re-sequencing data of soybean to uncover some potential functional GmJAG1 motif variation in soybean domestication.
4. Materials and Methods
4.1. Preparation of GmJAG1 DAP-Seq Library
DAP-seq library preparation was conducted according to the method described in a previous report [15]. Briefly, the full-length coding sequence of GmJAG1 (Glyma.20G116200) was cloned into the pGEX-4T-1 vector and transformed into the E. coil (Rosetta DE3). The GST-GmJAG1 recombinant protein was induced by 0.2 mM IPTG at 16 °C overnight and purified by the MagneGST beads (Promega, Beijing, China, cat.no.V8600). The DNA library was prepared using genomic DNA from a whole soybean plant via VAHTS Universal Pro DNA Library Prep Kit for Illumina Kit (Vazyme, Nanjing, China, cat.no.ND608) according to the manufacturer’s instructions. About 100 ng of DNA library was incubated with the beads and GmJAG1 recombinant protein mixture at room temperature for 1 h. The MagneGST beads were washed three times with 1 × PBS + NP40 (0.005%), followed by two washes with 1X PBS. After washing, the enriched DNA fragments pulled down by GST-GmJAG1 were recovered and amplified for 12 cycles using primers from the VAHTS Multiplex Oligos Set 4/5 for Illumina Kit (Vazyme, Nanjing, China, cat.no. N321). The PCR procedure was set to 98 °C for 1 min, 98 °C for 15 s, 60 °C for 20 s and 72 °C for 20 s. The final DAP-seq library was sent for Illumina sequencing in 150 bp paired-end mode. The primers are listed in Table S5.
4.2. DAP-Seq Data Processing and Motif Analysis
The adaptors and reads with low quality in the raw data of DAP-seq were removed by Trim_galore v6.10 software (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (accessed on 1 May 2024)). Then, the filtered DAP-seq reads with high sequencing quality were mapped to the Williams 82 V4 reference genome (https://phytozome-next.jgi.doe.gov (accessed on 1 May 2024)) using bowtie2 v2.54 software [25]. Mapped reads with MAPQ value over 30 were extracted by Samtools [26] and used for peak calling via HOMER v4.11 (http://homer.ucsd.edu/homer/ (accessed on 1 May 2024)) with the following parameters: findPeaks-style factor-F 8. The script “getFocalPeaks.pl” in HOMER was used to extract the enriched regions with a focal score over 0.9, which were retained for subsequent motif prediction. Motif prediction and scanning were performed using the scripts, findMotifs.pl and findMotifsGenome.pl in the HOMER software with the parameters set as default.
4.3. EMSA Experiment
The EMSA experiment was conducted according to a previous study [11]. Briefly, the pGEX-4T-1 and pGEX-4T-1-GmJAG1 vectors were transformed into the Rosetta (DE3) E. coli competence cell as mentioned above. The GST and GST-GmJAG1 recombinant proteins were induced with 0.2 mM IPTG at 16 °C overnight and purified by the MagneGST beads (Promega, Beijing, China, cat.no.V8600) according to the manufacturer’s instructions. The 5′-end biotinylated oligonucleotide containing the GmJAG1 putative binding motifs served as the probes. The unlabeled wild type or mutant oligonucleotides served as the cold competitors. The recombinant proteins (1 μg) were mixed with the labeled probes with or without cold competitors in 20 μL binding buffer at room temperature for 0.5 h. The protein–probes mixture was then loaded into the native gel. The gel shift assay was performed via LightShift Chemiluminescent EMSA Kit (Thermo, Waltham, MA, USA, cat.no.20148) and detected by ChemiDoc Imaging Systems (Bio-Rad, Shanghai, China). Probe sequences are listed in Table S5.
5. Conclusions
As a key TF involved in multiple developmental process of soybean, the binding motifs of GmJAG1 and its potential binding regions in the soybean genome are still elusive. In this study, two new GmJAG1 binding motifs were predicted by DAP-seq and their binding ability was further validated by EMSA experiments. Taken together, these findings served as important genomic resources for further understanding the regulatory mechanism of GmJAG1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Priest H.D. Filichkin S.A. Mockler T.C. Cis-regulatory elements in plant cell signaling Curr. Opin. Plant Biol.20091264364910.1016/j.pbi.2009.07.01619717332 · doi ↗ · pubmed ↗
- 2Spitz F. Furlong E.E. Transcription factors: From enhancer binding to developmental control Nat. Rev. Genet.20121361362610.1038/nrg 320722868264 · doi ↗ · pubmed ↗
- 3Inukai S. Kock K.H. Bulyk M.L. Transcription factor-DNA binding: Beyond binding site motifs Curr. Opin. Genet. Dev.20174311011910.1016/j.gde.2017.02.00728359978 PMC 5447501 · doi ↗ · pubmed ↗
- 4Matys V. Fricke E. Geffers R. Gossling E. Haubrock M. Hehl R. Hornischer K. Karas D. Kel A.E. Kel-Margoulis O.V. TRANSFAC: Transcriptional regulation, from patterns to profiles Nucleic Acids Res.20033137437810.1093/nar/gkg 10812520026 PMC 165555 · doi ↗ · pubmed ↗
- 5Fornes O. Castro-Mondragon J.A. Khan A. van der Lee R. Zhang X. Richmond P.A. Modi B.P. Correard S. Gheorghe M. Baranasic D. JASPAR 2020: Update of the open-access database of transcription factor binding profiles Nucleic Acids Res 202048 D 87D 9210.1093/nar/gkz 100131701148 PMC 7145627 · doi ↗ · pubmed ↗
- 6Guo A.Y. Chen X. Gao G. Zhang H. Zhu Q.H. Liu X.C. Zhong Y.F. Gu X. He K. Luo J. Plant TFDB: A comprehensive plant transcription factor database Nucleic Acids Res.200836 D 966D 96910.1093/nar/gkm 84117933783 PMC 2238823 · doi ↗ · pubmed ↗
- 7Chow C.N. Lee T.Y. Hung Y.C. Li G.Z. Tseng K.C. Liu Y.H. Kuo P.L. Zheng H.Q. Chang W.C. Plant PAN 3.0: A new and updated resource for reconstructing transcriptional regulatory networks from Ch IP-seq experiments in plants Nucleic Acids Res.201947 D 1155 D 116310.1093/nar/gky 108130395277 PMC 6323957 · doi ↗ · pubmed ↗
- 8Ohno C.K. Reddy G.V. Heisler M.G. Meyerowitz E.M. The Arabidopsis JAGGED gene encodes a zinc finger protein that promotes leaf tissue development Development 20041311111112210.1242/dev.0099114973281 · doi ↗ · pubmed ↗
