A tryptophan-phenylalanine binding motif for the histone methyltransferases MLL4 and MLL3
Soumi Biswas, Zohreh Tavaf, Caroline Benz, Moustafa Khalil, Dustin C. Becht, Leandro Simonetti, M. Andres Blanco, El Bachir Affar, Ylva Ivarsson, Tatiana G. Kutateladze

TL;DR
This study identifies a specific amino acid motif that binds to MLL4 and MLL3 enzymes, which are important in epigenetic regulation and cancer.
Contribution
The discovery of a tryptophan-phenylalanine motif that directly interacts with MLL4 and MLL3 is novel.
Findings
The sixth PHD finger of MLL4 and the seventh PHD finger of MLL3 bind to the tryptophan-phenylalanine motif.
Mutational and binding analyses reveal the molecular mechanism of the interaction.
MLL4/MLL3 and motif-containing proteins are co-expressed in several tumor types.
Abstract
The human methyltransferases mixed lineage leukemia 4 and 3 (MLL4 and MLL3) play pivotal roles in the regulation of epigenetic and transcriptional programs. Here, we report the identification and characterization of a tryptophan-phenylalanine binding motif recognized by MLL4 and MLL3. Binding of the sixth PHD finger of MLL4 and the seventh PHD finger of MLL3 to the tryptophan-phenylalanine motif derived from a set of human proteins was detected in a proteomic peptide-phage screening of intrinsically disordered regions of the human proteome and confirmed in NMR and MST assays. Mutational, genetic and binding interface analyses reveal the molecular mechanism underlying the direct interaction of MLL4 and MLL3 with the motif. A high correlation of expression of MLL4/MLL3 and the motif containing proteins in several tumor types suggests shared roles in oncogenic transcriptional programs. In…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcute Myeloid Leukemia Research · Genomics and Rare Diseases · Epigenetics and DNA Methylation
Mixed lineage leukemia 4 (MLL4) and homologous MLL3 are evolutionarily conserved human enzymes and well-recognized epigenetic regulators. MLL3/MLL4 mediate gene expression during development and are often mutated in developmental diseases and cancer (1, 2, 3, 4, 5). They belong to the MLL/KMT2 (lysine methyltransferase 2) family of enzymes characterized by the ability of individual family members to produce distinct histone H3K4 methylation states and levels in distinct genomic regions (6, 7, 8). For example, MLL4 is a major mono-methyltransferase that deposits H3K4me1 primarily in enhancer regions and plays pivotal roles in the enhancer activation, enhancer–promoter relationship, and the regulation of transcriptional programs in general (9, 10, 11, 12, 13, 14, 15).
MLL4, or according to new nomenclature KMT2D (lysine methyltransferase 2D), and MLL3, also known as KMT2C, are gigantic proteins. They contain a large set of zinc knuckles and plant homeodomain (PHD) fingers, i.e. seven PHD fingers in MLL4 and eight PHD fingers in MLL3, and the catalytic SET (Su(var)3 to 9, Enhancer of Zeste, Trithorax) domain (16). In MLL4, first six PHD fingers are assembled into two triple PHD finger cassettes (Fig. 1A). We have recently reported that the second and third PHD fingers (MLL4_PHD2/3_) of the first cassette (MLL4_PHD1/2/3_) bind to the MBH (MLL3/4 binding helix) region found in additional sexcombs like proteins and transcriptional coregulators ASXL1, ASXL2, and ASXL3 and that this function is conserved in MLL3 (17, 18). We also showed that the sixth PHD finger of MLL4 (MLL4_PHD6_) of the second cassette (MLL4_PHD4/5/6_) recognizes acetylated lysine 16 of histone H4 (H4K16ac) and the DNA dioxygenase TET3 (19, 20, 21). The biological roles of other PHD fingers, zinc knuckles, and intrinsically disordered regions that are predicted to encompass over 70% of the MLL4 sequence remain unclear. Furthermore, the synergistic, antagonistic, or independent nature and implications of the known interactions of MLL4 and MLL3 are challenging to characterize in the context of full-length proteins due to their large sizes, i.e. 5537 and 4,911, respectively, amino acids.Figure 1Identification of the tryptophan****-phenylalanine sequence as a ligand for MLL4_PHD456_. A, MLL4 domain architecture. PHD4, PHD5, zinc knuckle and PHD6 of MLL4 form the triple PHD finger cassette, MLL4_PHD456_. MLL4_PHD6_ is colored purple. B, overview of the proteomic peptide-phage display (ProP-PD selection) against the MLL4_PHD456_. MLL4_PHD456_ was used as bait in biopanning against the human disorderome phage library, version 2, for four rounds of selection. C, representative set of peptides identified through the ProP-PD selection (see Table S1). Shown are gene names, peptide sequences and % NGS counts. D, position-specific scoring matrix (PSSM) based on the identified peptide ligands. The MLL4_PHD456_ interacting motif is shown. E–G, overlaid ^1^H,^15^N HSQC spectra of MLL4_PHD456_ collected upon titration with (E) ZBED2_MPB_ (aa 128–148), (F) MED12_MPB_ (aa 1694–1713), and (G) MED13_MPB_ (aa 338–353). Spectra are color coded according to the protein:peptide molar ratio.
In this work, we report previously uncharacterized non-histone binding partners of MLL4 and MLL3. The interaction of PHD fingers of MLL4 and MLL3 with a tryptophan-phenylalanine motif from a set of human proteins was identified in a proteomic IDR peptide-phage display and confirmed in biochemical and mutagenesis studies. We describe the molecular basis for the motif recognition and demonstrate strong genetic correlation between MLL4/MLL3 and a set of motif-containing epigenetic coregulators in cells.
Results and discussion
Identification of the tryptophan-phenylalanine motif
We have previously characterized binding of MLL4_PHD456_ to the methylcytosine dioxygenase TET3 (21). This interaction was detected in the proteome peptide-phage screening of intrinsically disordered regions (IDRs) of the human proteome (22, 23), where the TET3 sequence yielded a high score. However, analysis of phage display selections and validation of enrichment of binding phages in enzyme-linked immunosorbent assay (ELISA) identified other potential interactors of MLL4_PHD456_ (Fig. 1, B and C and Table S1). A SLiMSearch analysis using an IUPred disorder cutoff of 0.4 (24) and subsequent enrichment filtering based on shared interaction partners or functional annotations and a statistical cutoff of p < 1 x 10^-4^ produced a dataset of sequences from 35 motif-containing proteins (Table S1). The hits included both nuclear proteins, such as TEX13B, ZBED2, MED13, MED12, VRK2, and GEMIN2 (Fig. 1C), and cytosolic proteins (Table S1). Although these potential ligands of MLL4_PHD456_ displayed NGS scores lower than TET3, we noticed that all contain seems like a conserved motif, WxxF/Y, where x is a non-conserved residue with a slight preference for a positively charged arginine or lysine residue in one of the two “x” positions (Fig. 1D).
To assess whether the MLL4_PHD456_ cassette can bind to ZBED2, MED13, and MED12, we carried out NMR titration experiments. We recorded ^1^H,^15^N heteronuclear single quantum coherence (HSQC) spectra of ^15^N-labeled MLL4_PHD456_ while unlabeled ZBED2 (aa 128–148), MED13 (aa 338–353) or MED12 (aa 1694–1713) peptides were titrated into NMR samples. All peptides tested induced substantial chemical shift perturbations (CSPs) in MLL4_PHD456_, indicating formation of the corresponding complexes (from here on we refer to the motif as MPB (MLL4/3 PHD binding) and to the peptides as ZBED2_MPB_, MED12_MPB_ and MED13_MPB_) (Fig. 1, E–G). Interestingly, we found that irrespective of the NGS score, the patterns of CSPs in three NMR titration experiments were roughly similar, implying that ZBED2_MPB_, MED12_MPB_ and MED13_MPB_ occupy the same binding site in MLL4_PHD456_. Furthermore, the comparable pattern of CSPs observed in MLL4_PHD456_ upon binding of TET3 (21) suggested that the binding site is in the PHD6 finger of MLL4 (MLL4_PHD6_).
MLL4PHD6 binds to ZBED2MPB, MED12MPB, and MED13MPB
To validate whether MLL4_PHD6_ is responsible for the interaction with ZBED2_MPB_, MED12_MPB_, and MED13_MPB_, we carried out NMR titrations using ^15^N-labeled MLL4_PHD6_. Stepwise addition of ZBED2_MPB_, MED12_MPB_, or MED13_MPB_ to MLL4_PHD6_ caused CSPs on par with CSPs observed in MLL4_PHD456_ (Fig. 2, A–C). Many amide resonances of MLL4_PHD6_ in the apo-state disappeared upon addition of the peptides, and another set of resonances corresponding to the peptide-bound state of MLL4_PHD6_ appeared. The slow exchange regime on the NMR chemical shift timescale suggested a tight binding, which was confirmed by measuring binding affinities of MLL4_PHD6_ to FAM-tagged ZBED2_MPB_, MED12_MPB_, and MED13_MPB_ peptides in microscale thermophoresis (MST) assays (Fig. 2, D and E). In agreement with NMR data, the dissociation constants (K_d_s) were found to be in the range of 0.4 to 0.9 μM (Fig. 2E), indicating that these interactions are slightly tighter than the association of MLL4_PHD6_ with H4K16ac (K_d_ of 1.1 μM) (20). Mass photometry (MP) measurements further supported the formation of the tight complex (Fig. 2F). GST fusion of MLL4_PHD456_ (used here due to MP size limitation) was saturated with MED13_MPB_ at the dimeric protein-peptide ratio of 1:1 (Fig. 2, F and G).Figure 2MLL4_PHD6_ binds to the MPB motif. A–C, overlaid ^1^H,^15^N HSQC spectra of MLL4_PHD6_ collected in the presence of increasing amounts of (A) ZBED2_MPB_, (B) MED12_MPB_, and (C) MED13_MPB_. Spectra are color coded according to the protein:peptide molar ratio. D, MST binding curves obtained for the interaction of MLL4_PHD6_ with the indicated FAM labeled peptides. E, binding affinities of MLL4_PHD6_ to the indicated peptides. K_d_s represent the average of three independent measurements ± SEM. Point errors represent SEM. (^a^) taken from (21). F, MP histogram of GST-MLL4_PHD456_ in the absence or presence of MED13_MPB_. G, table of calculated molecular masses of the complexes relevant to MP data in (F).
Mapping the MED13MPB-binding site of MLL4PHD6
The structure of the MLL4_PHD6_-TET3 complex shows that TET3 is bound in a hydrophobic groove surrounded by the negatively charged residues (Fig. 3A). To identify the binding interface for MED13_MPB_ and compare it to the binding site for TET3_MPB_, we analyzed NMR CSPs induced in MLL4_PHD6_ by MED13_MPB_. The residues whose amide signals exhibited line broadening beyond detection, i.e. disappeared upon addition of MED13_MPB_, were mapped onto the MLL4_PHD6_ structure (Fig. 3B). As shown in Figure 3, A and B set of hydrophobic residues, including L1519, L1520, I1521, F1551, and V1560, as well as the negatively charged residues D1518 and E1549 were most perturbed, and together revealed an extensive MED13_MPB_ binding pocket. Mutations of Y1514, E1540, and E1544 of MLL4_PHD6_ that form the pocket walls disrupted or reduced binding, indicating that π and electrostatic contacts are essential in the complex formation, whereas mutation of E1516 had no effect (Fig. 3, C–J). Comparison of the binding pockets mapped for MED13_MPB_ and observed in the complex with TET3_MPB_ suggested that both motif-containing peptides occupy the same binding site of MLL4_PHD6_ (Fig. 3, A and B).Figure 3Mapping the MED13_MPB_-binding site of MLL4_PHD6_. A, the structure of the MLL4_PHD6_-TET3 complex (PDB: 8u2y) (21). MLL4_PHD6_ is shown in a surface representation, and TET3 is shown as sticks. B, CSPs (line broadening beyond detection) induced by the addition of 5 M equivalents of MED13_MPB_ mapped onto the structure of MLL4_PHD6_-TET3 complex. The perturbed residues are colored red and labeled. The pocket walls forming residues are labeled. C–H, Superimposed ^1^H,^15^N HSQC spectra of wild type or mutated MLL4_PHD6_ collected upon titration with the indicated WT or mutated MED13_MPB_. Spectra are colour-coded according to the protein:peptide molar ratio. I, representative MST binding curve used to measure the K_d_ value of E1516 A MLL4_PHD6_ to FAM-MED13_MPB_. J, binding affinities of WT and mutated MLL4_PHD6_ to the indicated peptides, measured by (^a^) MST and (^b^) NMR.
The MPB motif binding activity is conserved in MLL3PHD7
Unlike MLL4 that contains seven PHD fingers, MLL3 contains eight PHD fingers (16) (Fig. 4A), and the seventh (MLL3_PHD7_) displays a high degree sequence similarity to MLL4_PHD6_ (Fig. 4B). We therefore tested whether the tryptophan-phenylalanine motif binding activity is conserved in MLL3_PHD7_ by NMR and MST. Gradual addition of MED13_MPB_ or TEX13B_MPB_ to ^15^N-labeled MLL3_PHD7_ led to large CSPs and notable disappearance of the amide signals, indicating that like MLL4_PHD6_, MLL3_PHD7_ recognizes the MPB motif (Fig. 4, C and D). In further support, binding affinities of 0.7 μM and 0.6 μM were measured for the interaction of MLL3_PHD7_ with FAM-MED13_MPB_ and FAM-TEX13B_MPB_, respectively (Fig. 4E). Another PHD finger of MLL3 (MLL3_PHD4_), whose position is somewhat similar to that of MLL3_PHD7_ (both are terminal PHD fingers that follow a single zinc knuckle in the respective zinc-finger cassettes), was unable to bind MED13_MPB_, underscoring the binding specificity of MLL4_PHD6_ and MLL3_PHD7_ toward the MPB motif (Fig. 4F). An AlphaFold predicted model of MLL3_PHD4_ suggested the presence of a groove albeit smaller than that of in MLL3_PHD7_ which along with dissimilar surface charge distribution arisen from differences in the respective sequences likely preclude the association with the MPB motif (Fig. 4, B and G).Figure 4The MPB motif recognition is conserved in MLL3_PHD7_. A, domain architecture of MLL3. MLL3_PHD7_ is colored green. B, Alignment of the MLL4_PHD6_, MLL3_PHD7_ and MLL3_PHD4_ sequences. The MPB binding site residues of MLL4_PHD6_ are labeled above the sequences. Residues coordinating two zinc ions are indicated below the sequences. C and D, overlaid ^1^H,^15^N HSQC spectra of MLL3_PHD7_ collected in presence of increasing concentration of (C) MED13_MPB_ and (D) TEX13B_MPB_ (aa 148–163). Spectra are color-coded according to the protein: peptide molar ratio. E, MST binding curves used to determine binding affinities of MLL3_PHD7_ to FAM-labeled MED13_MPB_ and TEX13B_MPB_. (F) Overlaid ^1^H,^15^N HSQC spectra of MLL3_PHD4_ collected in the presence of increasing concentration of MED13_MPB_. Spectra are color coded according to the protein:peptide molar ratio. G, Electrostatic surface potential of the AlphaFold2 predicted model of MLL3_PHD4_ and the structure of MLL3_PHD7_ (PDB: 6mlc) are shown with blue and red colors representing surface positive and negative charges, respectively. Only TET3_MPB_ is shown in the structural overlay of MLL4_PHD6_-TET3 complex (PDB: 8u2y) with MLL3_PHD7_ (PDB: 6mlc).
MED13 and MLL3 colocalize on promoters
Several studies indicate a relationship involving MLL4/MLL3 and the Mediator complex, which recruits RNA Pol II to promoters to stimulate gene expression and acts as a bridge between enhancers and promoters (25, 26). MED13 and MED12 are components of the four-subunits Mediator kinase module (MKM), a stable complex that reversibly associates with the Mediator complex but also functions independently of Mediator (25). The MKM-Mediator’s subunits contain extensive IDRs, enabling binding of a diverse array of cofactors, which is often accompanied by large conformational and allosteric changes. MLL3/MLL4 also modulate enhancer-promoter interactions and are required for the formation of enhancer-promoter contacts (13, 27, 28, 29). Furthermore, deletion of MLL4 decreases Mediator and Pol II levels on enhancers and leads to severe defects in gene expression and cell differentiation (8). To explore the functional cooperation of endogenous MLL3 and MED13, we analyzed ChIP-seq data from the CAL51 and HepG2 cancer cell lines previously reported for human proteins (ref. (30) and ENCODE). A strong signal observed in the heat map of MLL3 at MED13-bound chromatin regions indicated high genetic correlation of MED13 and MLL3, and both MED13 and MLL3 binding sites showed enrichment at promoters (Fig. 5, A and B). Further analysis of the peak overlap revealed that of the 15,713 MED13 binding sites and 34,590 MLL3 binding sites (58% of MED13 peaks and 26% of MLL3 peaks), strikingly, 9136 binding sites were co-occupied by MED13 and MLL3 (Fig. 5C). Colocalization of MED13 and MLL3 was also observed on promoters of target genes (Fig. 5D). Overall, these analyses suggest a potential functional relationship and/or co-regulation of gene expression by MED13 and MLL3; however, studies of genomic localization of both proteins in the same cell line are needed to validate and fully understand this relationship.Figure 5MLL3 and MED13 colocalize across the genome. A, Heatmaps of MLL3 and MED13 signal intensity at MED13-bound genomic regions in the CAL51 (MLL3) and HepG2 (MED13) cancer cell lines (GSE97326 and GSM5214336). B, genomic distributions of MED13 and MLL3 binding sites. C, overlap in MLL3 and MED13 binding sites. D, selected gene tracks of MLL3 and MED13 ChIP-seq data.
Expression of MLL4/MLL3 and the MPB motif-containing proteins correlates in cancer
Aberrant catalytic activities, mutations, and mislocalization of MLL4/MLL3 within the genome have been associated with the development and progression of various cancers (29, 31). Our analysis of a set of TCGA cancers indicated that MLL4 expression correlates with the expression of several proteins identified in the proteome peptide-phage screen (Figs. 1C and 6A). The most prominent positive correlations were observed with MED13 and MED12 as well as with TET3, a dioxygenase that converts methylated cytosine into the oxidized derivative hydroxymethylcytosine (Fig. 6A). The correlations were confirmed through analysis of TCGA PanCancer and GDC (Genomic Data Commons) datasets of cancer lineages. The strong second-order positive MLL4-MED13 correlation was observed particularly for THYM and DLBC (Fig. 6B). Consistent with this analysis, MLL4 and MED13 were found to be upregulated in these tumors compared to normal tissues (Fig. 6, C and D). Furthermore, a large number of genes associated with both MLL4 and MED13 suggested an overlapping role of MLL4 and MED13 in transcriptional programs in THYM and DLBC (Fig. 6E). Gene ontology analysis of functional enrichment revealed a high degree of association with transcription-permissive histone modifications and active gene transcription (Fig. 6, F and G). MLL3 and MED12 but not TET3 were also significantly upregulated in THYM (Fig. 6, H–J). Collectively, these analyses supported previous findings demonstrating that in addition to acting as tumor suppressors (15), MLL4/MLL3 can also promote cancer cell proliferation and tumor progression in certain context or tissue specific manner (32, 33). These results further suggested that MLL4/MLL3 and distinct tryptophan-phenylalanine-containing proteins may have synergistic roles in pro-oncogenic transcriptional programs. Alternatively, cooperation of upregulated MLL4/MLL3 and MED13/MED12 could play a role in shutting down transcription of tumor suppressor genes, as MKM has been shown to prevent the association of the Mediator with RNA Pol II (25, 34).Figure 6Correlative analyses of MLL4 and the MPB motif-containing proteins. A, pairwise mRNA-expression correlations of MLL4 and indicated proteins across TCGA cancer lineages. Hierarchical clustering of gene-wise correlation profiles is shown. Clustering highlights the relative similarity of correlations with MLL4. B, cross-datasets consistency of the MLL4-MED13 correlation. Correlation coefficients were computed using TCGA GDC and TCGA PanCancer datasets. THYM (thymoma) and DLBC (diffuse large B-cell lymphoma) (C, D) MLL4 and MED13 mRNA levels in THYM and DLBC TCGA tumors and TCGA + GTEx normal tissues. Statistical significance: ∗p < 0.05, by unpaired Student’s t test. E, Genes positively correlated with MLL4 and MED13 identified in THYM and DLBC in cbioportal (r ≥ 0.7 and p ≤ 0.05). F and G, gene ontology/molecular function enrichment of the overlapping correlated genes from (E) in THYM and DLBC. H–J, MLL3, MED12 and TET3 mRNA levels in THYM and DLBC TCGA tumors and TCGA + GTEx normal tissues. ∗p < 0.05, by unpaired Student’s t test.
In conclusion, in this work, we identified a tryptophan-phenylalanine-binding motif recognized by MLL4 and MLL3. Our proteomic phage screening analysis shows that this motif is present in a set of human proteins within their intrinsically disordered regions. We have validated five MPB motif containing peptides and confirmed direct interactions between MLL4_PHD6_ or MLL3_PHD7_ using biochemical assays and mutagenesis. A high degree genomic co-occupancy and a strong correlation of the expression of MLL4/MLL3 and the motif containing proteins suggests a potential functional link in transcription. Binding to this motif may represent a general mechanism for the recruitment or stabilization of MLL4/MLL3 at chromatin regions bound by motif-containing proteins or vice versa. Although clearly all potential interactions require experimental validation, the association of MLL4/MLL3 with the MPB motif derived from five proteins characterized in this work suggests that the identification of additional interactors is likely. As in the case of canonical SH2, PDZ, ET and other domains known to recognize motifs in a wide array of proteins, the mechanism by which MLL4/MLL3 select a particular binding partner in diverse cellular processes require comprehensive studies. The extensive IDRs of MLL4/MLL3 and multiple PHD fingers with non-redundant functions may contribute to such selectivity, allowing for MLL4/MLL3 engagement with distinct binding partners on demand.
Experimental procedures
Protein expression and purification
MLL4_PHD456_ (aa 1376–1562) and MLL4_PHD6_ (aa 1503–1562) of human MLL4 and MLL3_PHD4_ (aa 464–520) and MLL3_PHD7_ (aa 1083–1143) of human MLL3 were expressed in Rosetta2 (DE3) pLysS cells grown in Luria Broth or minimal media (M9) supplemented with (15) NH_4_Cl and 50 μM ZnCl_2_. After induction with IPTG (0.5 mM) overnight at 16 °C, bacteria were harvested by centrifugation. For all GST-tagged proteins (except for GST-MLL4_PHD456_, see below), pellets were resuspended in 50 mM Tris pH 7.5 buffer, supplemented with 200 to 500 mM NaCl and 2 to 3 mM dithiothreitol (DTT), and suspensions were lysed by sonication on ice. Proteins were purified on glutathione agarose or Sepharose 4B beads (Thermo Scientific), and the GST tag was cleaved with Thrombin or TEV protease overnight at 4 °C (Fig. S1). Proteins were further purified by size exclusion chromatography and concentrated in Millipore concentrators. All mutants were generated by site-directed mutagenesis using the Stratagene QuikChange mutagenesis protocol, grown and purified as WT proteins.
Pellet (inclusion bodies) of GST-MLL4_PHD456_ was dissolved in 50 mM Tris pH 7.5 buffer, supplemented with 150 mM NaCl, 5 mM DTT and 6 M urea. The protein solution was centrifuged, and the supernatant was applied to the DEAE-cellulose column pre-equilibrated with the same buffer. GST-MLL4_PHD456_ was eluted with buffer (50 mM Tris pH 7.5, 1 M NaCl, 5 mM DTT and 6 M urea) and subjected to dialysis against buffer (50 mM Tris pH 7.5, 150 mM NaCl and 5 mM DTT) for 24 h to remove urea. GST tag was cleaved with PreScission protease, and MLL4_PHD456_ was purified by size exclusion chromatography and concentrated in a Millipore concentrator.
ProP-PD selection and data analysis
The HD2 phage library (22) was used for four rounds of phage display selections. GST was used for negative pre-selection to remove non-specific binders and MLL4_PHD456_ was used as a bait. 10 μg of protein in 100 μl phosphate-buffered saline (PBS, 137 mM NaCl, 2.7 mM KCl, 95 mM Na_2_HPO_4_, 15 mM KH_2_PO_4_ pH 7.5) was immobilized in a 96-well MaxiSorp plate (Nunc) overnight at 4 °C while shaking. The wells were blocked with 0.5% BSA in PBS for 1 h at 4 °C while shaking. Wells were washed four times with 200 μl PT buffer (PBS, 0.05% (v/v) Tween20). The phage library (10^11^ pfu in 100 μl PBS) was incubated in the GST-coated wells for 1 h at 4 °C and then transferred to the bait-coated well, and incubated for 2 h, 4 °C, under gentle agitation. The phage solution was aspirated, and the well was washed 5 times with 200 μl PT buffer. Unbound phages were removed by washing 5 times with 0.05% Tween20 in PBS. The bound phages were eluted with 100 μl/well log-phase E.coli OmniMAX by incubating for 30 min at 37 °C while shaking. M13KO7 helper phages (10^11^ PFU/ml) were added to each well and incubated for 45 min at 37 °C while shaking. The bacterial cultures were transferred to 1 ml 2 YT supplemented with carbenicillin (100 μg/ml), kanamycin (30 μg/ml), and 0.3 mM IPTG and grew overnight at 37 °C while shaking. The bacterial cultures were pelleted by centrifugation at 3500 x g for 10 min at 4 °C. The pH of the phage supernatant was adjusted with 1/10th volume of 10x PBS, and the samples were then incubated at 65 °C for 10 min. The phage pools from each round were used for the next round of selections, and the process was repeated four times. The enrichment of binding phages of each round of selection was assessed by phage pool ELISA. The peptide-coding regions of binding-enriched phage pools were PCR-amplified and barcoded with Phusion High-Fidelity polymerase (Thermo Scientific). The PCR products were normalized with Mag-bind Total Pure NGS and cleaned up from a 2% agarose gel using the QIAquick Gel extraction Kit (Qiagen). The sample was sequenced using Illumina MiSeq v3, 1 × 150 bp read setup, 20% PhiX by the NGS-NGI SciLifeLab facility. The results were processed as previously described (35). Peptide sequences were annotated using PepTools (22).
NMR experiments
NMR experiments were carried out at 298 K on Varian INOVA 600 MHz and 900 MHz and Bruker 600 MHz spectrometers, as described (36). Briefly, ^1^H,^15^N HSQC spectra of 0.1 mM uniformly ^15^N-labeled WT or mutated MLL4_PHD456_, MLL4_PHD6_, MLL3_PHD7_, and MLL3_PHD4_ were recorded while peptides (WT or mutated) (synthesized by SynPeptide) were added stepwise. The experiments were carried out in 50 mM Tris (pH 7), 150 mM NaCl, and 2.5 mM DTT buffer.
Microscale thermophoresis (MST)
MST experiments were performed using a Monolith NT.115 instrument (NanoTemper). All experiments were performed using SEC-purified MLL4_PHD6_ and MLL3_PHD7_ in a buffer containing 50 mM Tris-HCl pH 7.0, 150 mM NaCl and 2.5 mM DTT. Dissociation constants were determined using a direct binding assay (37) in which unlabeled proteins (MLL4_PHD6_ and MLL3_PHD7_) were varied in concentration by serial dilution of discrete samples. FAM-labeled peptides (synthesized by SynPeptide) were maintained at a final concentration of 100 nM for all samples. The measurements were performed at 40% LED and medium MST power with 3 s pre-laser time, 20 s laser on-time and 1 s off-time. The K_d_ values were calculated using MO Affinity Analysis software (NanoTemper) (n = 3 for MLL4_PHD6_ and MLL3_PHD7_). Plots were generated in GraphPad PRISM.
Mass photometry (MP)
MP experiments were conducted on a Refeyn TwoMP mass photometer (Refeyn Ltd) as described (38). To overcome minimum size limitation, a GST-fusion construct, GST-MLL4_PHD456_, was used. Experiments were performed on 10 nM GST-MLL4_PHD456_ in the absence and presence of MED13_MPB_ peptide. 10 microliter of each sample was loaded into the sample wells of the silicon cassettes assembled onto MassGlass UC coverslips (Refeyn, Ltd). Measurements were performed at room temperature in buffer (25 mM Tris pH 7.0, 150 mM NaCl, 2.5 mM DTT). β-amylase was used as a calibration standard (56 kDa, 112 kDa, and 224 kDa). After the focus was set and locked, movies were captured for 60 s (2800 frames) using AcquireMP software (Refeyn Ltd). Data were processed using DiscoverMP (Refeyn Ltd).
ChIP-seq analysis
MLL3 ChIP-seq dataset from CAL51 cells was obtained from the Gene Expression Omnibus entry GSE97326. Briefly, raw fastq reads were downloaded, preprocessed with fastp, mapped with Bowtie2, filtered with Samtools using default settings, de-duplicated, and used for peak calling with MACS2. Heat maps were generated using deepTools, and peak overlap analyses were performed with bedtools. MED13 ChIP-seq data (peak sets and bigwig files) from HepG2 cells were generated by the Encode project (GSM5214336) and were downloaded via the Cistrome data browser.
Genomic correlations analysis
Pairwise correlation analysis and clustering. Correlation and co-expression analyses were performed in cBioPortal using TCGA GDC (Genomic Data Commons) datasets across 29 cancer lineages. TCGA PanCancer datasets were used for cross-datasets validation. For each lineage, pairwise Spearman correlations were obtained between MLL4 and MED13, MED12, TET3, TEX13B, ZBED2, VRK2, and GEMIN2. Gene-wise correlation profiles (per lineage) were assembled into a matrix and subjected to hierarchical clustering of genes by Euclidean distance using public R packages. Cross-datasets consistency between GDC and PanCancer datasets for the MLL4-MED13 correlation was assessed using the Spearman method.
Tumor vs normal differential expression analysis was performed using GEPIA2 platform. mRNA expression levels were compared between TCGA tumors and paired TCGA + GTEx normal tissues. THYM, Tumors N = 118, Normal N = 339. DLBC, Tumors N = 47, Normal N = 337. Normal samples are blood samples or normal adjacent tissues (NAT). Statistical significance: unpaired t test performed by GEPIA2 expression analysis tool.
Co-expression overlap and GO analysis. Within DLBC and THYM, the cBioPortal co-expression tool was used to identify positively correlated genes for MLL4 and MED13 using thresholds r ≥ 0.7 and p ≤ 0.05. Unique and Overlap counts were computed. Shared positively correlated genes were analyzed by ShinyGO v0.85 at FDR 0.05, and term size 2 to 2000. The top 15 Representative-enriched GO/Molecular Function categories are reported. Exact n (lineages, overlaps) and thresholds (r, p, FDR, term size) are indicated.
Data availability
All relevant data supporting the key findings of this study are available within the article.
Code availability
This paper does not report original code.
Supporting information
This article contains supporting information.
Conflict of interest
The authors declare that they do not have any conflicts of interest with the content of this article.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Froimchuk E.Jang Y.Ge K.Histone H 3 lysine 4 methyltransferase KMT 2D Gene 62720173373422866992410.1016/j.gene.2017.06.056PMC 5546304 · doi ↗ · pubmed ↗
- 2Sze C.C.Shilatifard A.MLL 3/MLL 4/COMPASS family on epigenetic regulation of enhancer function and cancer Cold Spring Harb. Perspect. Med.62016 a 02642710.1101/cshperspect.a 026427 PMC 508850927638352 · doi ↗ · pubmed ↗
- 3Zhao Z.Cao K.Watanabe J.Philips C.N.Zeidner J.M.Ishi Y.Therapeutic targeting of metabolic vulnerabilities in cancers with MLL 3/4-COMPASS epigenetic regulator mutations J. Clin. Invest.1332023 e 16999310.1172/JCI 169993 PMC 1031336537252797 · doi ↗ · pubmed ↗
- 4Dhar S.S.Brown C.Rizvi A.Reed L.Kotla S.Zod C.Heterozygous Kmt 2d loss diminishes enhancers to render medulloblastoma cells vulnerable to combinatory inhibition of LSD 1 and OXPHOS Cell Rep.44202511561910.1016/j.celrep.2025.115619 PMC 1232406940286267 · doi ↗ · pubmed ↗
- 5Zhao Z.Aoi Y.Philips C.N.Meghani K.A.Gold S.R.Yu Y.Somatic mutations of MLL 4/COMPASS induce cytoplasmic localization providing molecular insight into cancer prognosis and treatment Proc. Natl. Acad. Sci. U. S. A.1202023 e 231006312010.1073/pnas.2310063120 PMC 1075627238113256 · doi ↗ · pubmed ↗
- 6Van H.T.Xie G.Dong P.Liu Z.Ge K.KMT 2 family of H 3K 4 methyltransferases: enzymatic activity-dependent and -independent functions J. Mol. Biol.436202416845310.1016/j.jmb.2024.168453 PMC 1095730838266981 · doi ↗ · pubmed ↗
- 7Rao R.C.Dou Y.Hijacked in cancer: the KMT 2 (MLL) family of methyltransferases Nat. Rev. Cancer 1520153343462599871310.1038/nrc 3929 PMC 4493861 · doi ↗ · pubmed ↗
- 8Lee J.E.Wang C.Xu S.Cho Y.W.Wang L.Feng X.H 3K 4 mono- and di-methyltransferase MLL 4 is required for enhancer activation during cell differentiatione Life 22013 e 0150310.7554/e Life.01503 PMC 386937524368734 · doi ↗ · pubmed ↗
