Systematic Profiling of Peptide Substrate Specificity in N‑Terminal Processing by Methionine Aminopeptidase Using mRNA Display and an Unnatural Methionine Analogue

Raphael J. Turra; Satoru Horiya; Mahesh Neralkar; Jennifer K. Bailey; Timothy J. Walsh; Viktor Horvath; Isaac J. Krauss

PMC · DOI:10.1021/acschembio.5c00680·January 8, 2026

Systematic Profiling of Peptide Substrate Specificity in N‑Terminal Processing by Methionine Aminopeptidase Using mRNA Display and an Unnatural Methionine Analogue

Raphael J. Turra, Satoru Horiya, Mahesh Neralkar, Jennifer K. Bailey, Timothy J. Walsh, Viktor Horvath, Isaac J. Krauss

PDF

Open Access

TL;DR

This study uses a high-throughput method to understand how an enzyme called methionine aminopeptidase (MAP) processes peptides with modified starting amino acids.

Contribution

The paper introduces a systematic profiling of MAP's substrate specificity using an unnatural methionine analogue and mRNA display.

Findings

01

The second residue after the cleavage site (P1′ position) most strongly affects MAP cleavage efficiency.

02

Residues further downstream have moderate impacts on cleavage efficiency.

03

Results align with previous trends observed for natural substrates and are supported by enzyme–substrate modeling.

Abstract

Methionine aminopeptidase (MAP) is useful in chemical biology research for the N-terminal processing of peptides and proteins and in medicine as a potential therapeutic target. These technologies can benefit from a precise understanding of the enzyme’s substrate specificity profiled over a wide chemical space, including not just natural substrates, peptides containing N-terminal Met, but also unnatural peptide substrates containing N-terminal Met analogues that are also cleaved by MAP like homopropargylglycine (HPG) and azidohomoalanine (AHA). A few studies have profiled substrate specificity for cleavage of N-terminal Met, but none have systematically done so using N-terminal Met analogues. Therefore, we devised a high-throughput profiling experiment based on mRNA display and next-generation sequencing to probe MAP’s substrate specificity using N-terminal HPG. From subgroup analysis of…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Proteins1

Chemicals5

Figures7

Click any figure to enlarge with its caption.

(a) N-terminal processing in prokaryotes by the combined activities of PDF and MAP to cleave formyl-Met. (b) Analogous N-terminal processing occurs with methionine analogues like formyl-HPG in the case of residue-specific noncanonical amino acid incorporation.

MAP library design. Although the final leucine residue is encoded in the ORF, it is not translated during the mRNA–peptide fusion generation due to RNA modification by psoralen cross-linking at this site.

Selection scheme. See Figure S1 for workflow details on post-translational purification and enzymatic digestion. Library fractions sequenced by NGS are indicated by boxes.

Estimated percent cleavage values from subgroup analyses. Values from the P1′ subgroup analysis are presented as a ranked list (a) and plotted against the residue side-chain size (b). Values from P2′ (c), P3′ (d), and P4′ (e) subgroup analyses with selected P1′ = {Asn, Ile, Gln, Glu, Leu, or Asp} are presented only as ranked lists. Colors are used to qualitatively define cleavage efficiency tiers, with blue assigned to tier 1, orange to tier 2, and gray to tier 3. Data are averaged between variable-long and variable-short libraries, and error bars represent the standard deviation. Individual library data are given in Figures S4, S6–S8.

(a) Electrostatic potential surface of the MAP enzyme as calculated from the published crystal structure (PDB: 2MAT) using PyMOL 1.8 with the APBS plugin. , (b) AlphaFold-predicted structure (ipTM = 0.96, pTM = 0.97) of MAP (colored by electrostatic potential calculation) in complex with short peptide MAWRVS (see Figure S10 for other peptides modeled).

Heatmaps of percent cleavage calculated from subgroup analyses of two-residue combinations: P1′ × P2′ using the averaged variable libraries (a), P2′ × P3′ using averaged fixed libraries (fixed P1′ Ala) (b), and P2′ × P3′ using averaged variable libraries (fixed P1′ Asn, Ile, Gln, Glu, Leu, or Asp) (c). Columns and rows are sorted by average values.

Percent cleavage observed for synthetic heptapeptides with N-terminal Met (a) or HPG (b,c) treated with MAP and quantified by LC/MS (triplicate data). Assays were run either with 0.43 μM MAP and excess peptide substrate (180 μM) (a,b) or with high MAP concentration (15 μM) equimolar with peptide (15 μM) (c). Peptides start with the sequence displayed (including N-terminal Met or HPG (=X)), followed by TYNK (except for XNWR and XNWD, which have R and D, respectively, at their P3′ position). Error bars represent standard deviation. Colors correspond to cleavage efficiency tiers as in Figure (blue = tier 1; orange/brown = tier 2; gray = tier 3).

Funding2

—National Institute of Allergy and Infectious Diseases10.13039/100000060
—National Institute of Allergy and Infectious Diseases10.13039/100000060

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPeptidase Inhibition and Analysis · Neuropeptides and Animal Physiology · Antimicrobial Peptides and Activities

Full text

Introduction

Methionine aminopeptidase (MAP) has demonstrated utility as both a tool for chemical biology and a potential therapeutic target. The natural function of MAP is to cleave the obligatory N-terminal Met residue from nascent peptides either alone (as in eukaryotes) or in conjunction with peptide deformylase (PDF) (as in prokaryotes) (Figurea). Medically, inhibition of bacterial MAP is a potential antibiotic strategy, while inhibition of human MAP2 may be beneficial for combatting obesity and cancer. ?,? In chemical biology research, noncanonical amino acid (ncAA) Met analogues like homopropargylglycine (HPG) and azidohomoalanine (AHA) have been used to residue-specifically incorporate chemical handles into proteins for bioorthogonal coupling of probes and other functional conjugates. ?,? Since the activity of MAP extends to cleavage of N-terminal Met analogues, MAP enables removal of an N-terminal conjugation site when only internal conjugation is desirable (Figureb). For example, we recently employed MAP in the construction of glycopeptide mRNA display libraries in which CuAAC glycan attachment occurred only at internal sites.?

(a) N-terminal processing in prokaryotes by the combined activities of PDF and MAP to cleave formyl-Met. (b) Analogous N-terminal processing occurs with methionine analogues like formyl-HPG in the case of residue-specific noncanonical amino acid incorporation.

Underlying all these applications is the need to understand MAP’s substrate preference to ensure either complete cleavage or complete inhibition of cleavage. Previous work on this question has focused on MAP’s natural substrates, peptides containing N-terminal Met. The two most comprehensive of these studies confirm that while the identity of the residue at the second position (P1′) has the greatest impact on cleavage success, the identity of residues further downstream can also have a notable impact. ?,? While N-terminal Met has been thoroughly studied, ?−? ? ? ? no such comprehensive and systematic study has been conducted to determine MAP substrate specificity of peptides containing N-terminal Met analogues. Smaller studies so far have demonstrated that while the rules determining cleavage efficiency appear similar to those for peptides containing N-terminal Met, there may be some critical differences. ?−? ?

We opted to use mRNA display to profile the substrate specificity of MAP. In general, the combinatorial and high-throughput nature of display techniques allows for interrogation of a large chemical space, and several studies have demonstrated that display techniques in conjunction with next-generation sequencing (NGS) can elucidate the substrate preference of promiscuous post-translational enzymes. ?−? ? ? ? As an in vitro technique, mRNA display has the advantages of extreme flexibility regarding noncanonical amino acid incorporation and post-translational (including chemical) modifications as well as superior diversity unrestrained by transformation efficiency, allowing for more complete sampling of a wider chemical space.? Moreover, our use of PURE system (protein synthesis using recombinant elements) enables the facile incorporation of unnatural amino acids ?−? ? as well as control over the presence/absence of N-terminal modification enzymes in translation reactions. In this work, we present the results of an mRNA display selection designed to select sequences based on their differential ability to undergo MAP cleavage of their N-terminal HPG. Briefly, short peptide-encoding mRNA display libraries containing an N-terminal HPG were subjected to CuAAC click attachment of a biotinylated azide linker and partitioned using streptavidin magnetic beads. Partitioned fractions were sequenced, and the relative abundance in sequencing results was used to calculate the extent of HPG cleavage for comparison of different library subgroups.

Results and Discussion

Library

Design

For the profiling experiment, four short peptide-encoding DNA libraries were prepared, each containing a randomized region consisting of NNS codons (N = A/T/G/C, S = G/C) (Figure). The N-terminal HPG (M) encoded by each library was followed by either Ala or a random amino acid in the P1′ position (“fixed” and “variable” libraries, respectively) and then either 8 or 4 random amino acids (“long” and “short” libraries, respectively), followed by a linker and His tag. Short libraries were limited in length to ensure complete coverage by NGS, whereas long libraries were extended to test further downstream positions. All libraries began with a constant 5′ region containing a T7 promoter, epsilon enhancer, and Shine–Dalgarno sequence and ended with a constant 3′ sequence for cross-linking a puromycin-containing oligonucleotide. To prevent library cross-contamination, variable and fixed libraries were designed with different 3′ constant regions such that they used different PCR and reverse transcription primers (Table S1).

MAP library design. Although the final leucine residue is encoded in the ORF, it is not translated during the mRNA–peptide fusion generation due to RNA modification by psoralen cross-linking at this site.

mRNA Display Selection

Overall, the profiling experiment consisted of two rounds of selection, in which libraries were biotinylated by CuAAC and retrieved on streptavidin beads; the first round was performed without MAP digestion to enrich for “clickable” sequences that could be successfully biotinylated by CuAAC, and the second was performed with MAP digestion to partition cleavable from uncleavable sequences (Figure). In the second round, libraries were split, and a parallel control selection was performed without MAP digestion to control for differential click efficiencies of various sequences.

Selection scheme. See Figure S1 for workflow details on post-translational purification and enzymatic digestion. Library fractions sequenced by NGS are indicated by boxes.

For round 1 of selection, library preparation largely followed the procedure used in our previous studies. ?,? DNA libraries were transcribed by T7 RNA polymerase, and the resulting RNA libraries were photo-cross-linked to a puromycin-containing oligo. Purified cross-linked RNA libraries were translated with PURE system containing Met analogue HPG in the place of Met, generating mRNA–alkynyl peptide fusions. Following the scheme given in Figure S1, fusions were captured on streptavidin UltraLink resin loaded with photocleavable biotinylated reverse transcription primers; beads were separated from unfused peptides, and reverse transcription and digestion with PDF were performed on-resin. Although we were previously able to N-terminally process long peptide libraries by inclusion of MAP and PDF in the translation mixture,? in this case, it was necessary to perform N-terminal processing as a post-translational enzymatic reaction (Figure S2). We hypothesize that this is because these peptides were too short to reach the end of the ribosomal exit tunnel, ?,? making their N-terminus inaccessible for cleavage. Fusions were recovered from the resin after photocleavage of the linker with a 365 nm UV light. After Ni-NTA purification to remove any unfused RNA and DNA, fusion libraries were subjected to a negative selection against M280 streptavidin magnetic beads to remove peptide sequences that bind to the beads. The remaining library was subjected to CuAAC to attach a biotin–PEG_11_–azide linker.

In round 1 of selection (without MAP cleavage), we noticed that fraction bound values, which would be 100% in the case of perfect click efficiency and bead uptake, were consistently between 38 and 53% (Table S2). Optimization experiments consisting of resubjecting libraries to a second round of click reactions before streptavidin pulldown or trying different streptavidin resins did not improve fraction bound values, so selection was carried forward as is.

For round 2 of selection, DNA was recovered from the round 1 bound fraction by on-bead PCR, and cDNA–mRNA–peptide fusion libraries were prepared in the same way as in round 1 with a few differences. Libraries were split in half after translation for a total of 8 libraries prepared in parallel. While the first half was subjected to just PDF digestion while on streptavidin Ultralink resin, the second half was subjected to both PDF and MAP digestion. We used concentrations of PDF (6 μM) and MAP (15 μM) that were previously determined to give high conversion for individual easily cleavable sequences.? Some MAP-digested libraries demonstrated unusually high binding during negative selection against M280 streptavidin magnetic beads (up to 11% fraction bound), so a second negative selection with streptavidin beads was performed in which we observed minimal binding in all libraries. After verifying that libraries could be successfully functionalized on a test scale with a large oligosaccharide azide previously used by the lab (Figure S3), libraries were then twice subjected to CuAAC with biotin–PEG_11_–azide and incubated with M280 streptavidin magnetic beads. Non-MAP-digested libraries demonstrated fraction bound values from 45 to 53%, very similar to those observed for round 1, while MAP-digested libraries demonstrated lower fraction bound values from 5 to 27% (Table S2), consistent with the expected removal of the N-terminal HPG alkynyl click partner. As predicted, fixed libraries demonstrated much lower fraction bound values than variable libraries, likely owing to the fixed Ala in the P1′ position known to be advantageous for MAP cleavage. DNA was recovered from both bound and unbound fractions and sequenced by NGS (Illumina HiSeq, GENEWIZ).

NGS Analysis

NGS results were processed to identify all unique peptide sequences in each library and to collect their frequencies. The 16 library DNA samples (4 libraries × ± MAP × bound/unbound fractions) were labeled with adapter barcodes and pooled in approximately equimolar amounts for sequencing. Starting from the raw Illumina paired-end reads reported in FASTQ files, we used Paired-End reAd MergeR (PEAR v0.9.11)? to assemble the reads using a PHRED quality score cutoff of 36 for trimming, a p-value of 0.001, and a minimum overlap of 100 bp. Typically, 80–91% of the reads were successfully assembled. Integrating SeqKit (v.2.2.0)? for FASTQ/FASTA file manipulation, a collection of AWK and BASH scripts were used to orient the fragments into the forward direction, crop off the 5′ and 3′ constant regions, and align the sequences at their start codon. Nucleotide sequences were then translated into peptide sequences, and those sequences containing any combination of HPG or stop codon in the randomized region, amino acid other than HPG in the P1 position, or “X” (indicating that the original nucleotide sequence included an uncalled base) were filtered out. While the proportion of HPG in the randomized region was highly variable across libraries, the combination of sequences filtered out based on other conditions never exceeded 3.5%.

To estimate the cleavage efficiency from NGS data, we needed to calculate the amount of cleaved and uncleaved peptides for each subgroup. This required two steps: (1) normalization to convert read counts to the amount of peptide and (2) correction to account for variability in click efficiency. To normalize data similar to how it was done in Jalali-Yazdi et al.,? we used NGS read counts to quantify the proportions of different subgroups in each fraction and scintillation counting to quantify the total amount of peptides (n _ x _) in each fraction (x = {U, B, dU, dB}, where U = unbound, B = bound, dU = MAP-digested unbound, dB = MAP-digested bound) as given in Table S3. For each peptide, we expected one cDNA molecule displayed and uniformly PCR amplified across the library for sequencing such that proportionality exists between the number of reads (N) and amount of peptides (n) as defined in eq. The peptide amounts of each subgroup (n _ x,i _, where i denotes amino acid residue at position P1′ or P2′, etc.) could therefore be determined by multiplying the total amount of peptides (n _ x _) by the ratio of the number of reads of a subgroup (N _ x,i _) and the total number of reads (N _ x _) of a library as defined in eq

[eqn]

[eqn]

To correct for the observed variability in the click efficiency and effectiveness of the pull-down procedures, we defined subgroup-specific click efficiency factors (f) calculated from bound versus unbound peptide quantities in round 2 non-MAP-digested libraries as shown in eq

[eqn]

Concerning round 2 MAP-digested libraries, the bound amount (n _dB,i _) corresponds to the peptides that the MAP enzyme failed to cleave and was successfully clicked, and the unbound amount (n _dU,i _) corresponds to the peptides that were cleaved plus those that were uncleaved but failed to click. By utilizing the click efficiency factor (eq), we can estimate the uncleaved but unclicked portion of the unbound peptides, which then can be used to find the total amount of cleaved peptides (n _ i _ ^ C ^) in a subgroup as shown in eq

[eqn]

Similarly, the total amount of peptides that remained uncleaved can be found by using eq

[eqn]

We defined a subgroup’s cleavage efficiency (E) as the ratio of the cleaved portion over the total amount of peptide in the subgroup as given in eq

[eqn]

We first examined the effect of the P1′ position on MAP cleavage efficiency. Since fixed libraries contained a fixed Ala in the P1′ position, percent cleaved calculations were conducted only on the variable-long and variable-short libraries. Sorting groups with each P1′ amino acid by percent cleavage, three tiers are evident (Figurea). The first tier, made up of P1′ = Ser, Pro, Thr, Ala, Gly, and Val, reaches the highest level of cleavage, approximately 85% across all sequences. The second tier, made up of P1′ = Asn, Ile, Gln, Cys, Glu, and Leu, exhibits intermediate (40–75%) levels of cleavage. The third tier, made up of P1′ = Asp, His, Phe, Trp, Lys, Tyr, and Arg, are cleaved very little (≤30%). Results from the long and short libraries were highly correlated, highlighting the consistency of the profiling experiment (Figure S4). It should be noted that, based on the previous literature, tier 1 and tier 3 subgroups were expected to cleave completely or not at all, while we actually observed values of 85% and 20% cleavages, respectively. This discrepancy may result from the fact that these are average cleavage values of diverse peptides within each subset; for instance, sequences with P1′ = Ala should normally be easily cleaved, but we found that they were not if they contained Cys at P2′ or P3′. Additionally, biases could result from an imperfect correction of noise introduced by click efficiency, PDF digestion bias, on-resin N-terminal processing, differential bead uptake efficiency, or other biases inherent to mRNA display selections. Regardless of the absolute scaling of the values, numerous trends based on relative ranking are clear and can be structurally rationalized.

Estimated percent cleavage values from subgroup analyses. Values from the P1′ subgroup analysis are presented as a ranked list (a) and plotted against the residue side-chain size (b). Values from P2′ (c), P3′ (d), and P4′ (e) subgroup analyses with selected P1′ = {Asn, Ile, Gln, Glu, Leu, or Asp} are presented only as ranked lists. Colors are used to qualitatively define cleavage efficiency tiers, with blue assigned to tier 1, orange to tier 2, and gray to tier 3. Data are averaged between variable-long and variable-short libraries, and error bars represent the standard deviation. Individual library data are given in Figures S4, S6–S8.

Among the three tiers noted above, the side-chain volume has a strong impact on the cleavage efficiency. Plotting the amino acid percent cleavage against side-chain volume reveals a clear sigmoidal trendline reaching a high plateau at the small side-chain size and a low plateau at large side-chain size (Figureb). Looking at side-chain structures, each addition to the side-chain length is detrimental to cleavage efficiency (i.e. Val vs Leu/Ile, Asn vs Gln). Compared with alkyl side chains of similar size, a polar but uncharged moiety appears to improve the cleavage efficiency (i.e. Val vs Thr, Leu vs Asn), potentially by allowing for additional stabilizing H-bonds. Acidic P1′ side chains are detrimental to the cleavage efficiency (i.e. Gln vs Glu, Asn vs Asp), with Asp being worse for cleavage than Glu despite having a shorter side chain. Aromatic and basic amino acids at P1′, which all have long or large side chains, all effectively inhibit MAP cleavage in the P1′ position.

It is interesting to note that, excluding Cys, the click efficiency factor values for each P1′ amino acid lay within a narrow range: f = 0.65–0.85 and 0.95–1.15 for the variable-long and variable-short libraries, respectively. This suggests that click efficiency was more influenced by library length than by the identity of the P1′ amino acid, excluding Cys, which was especially detrimental to click efficiency (f = 0.36 and 0.51 for variable-long and variable-short libraries, respectively). Among the other P1′ amino acids, despite the relatively small range, f values were nonetheless significantly correlated between long and short libraries (Figure S5).

Subgroup analyses were also conducted for P2′ and later positions in both fixed and variable libraries, but observed impacts on cleavage efficiency and correlations between libraries declined as the distance from the cleavage site increased. As would be expected from previous observations, the percent cleavage values were distributed over a smaller range and demonstrated smaller spread for the P2′ and later positions (Table S4), reflecting the large impact of the P1′ position, which is averaged in this type of subgroup analysis. To highlight the effect of the P2′ and downstream positions, we focused our analysis on peptides containing P1′ with an intermediate effect on cleavage (Asn, Ile, Gln, Glu, Leu, or Asp) for which we observed a greater spread in percent cleavage values for the P2′–P4′ positions. From the sorted percent cleavage graph of the P2′ position, the best two (Trp and Ala) and worst three (Pro, His, and Ile) P2′ amino acids for MAP cleavage are evident (Figurec), with high agreement maintained between variable-long and variable-short libraries (Figure S6c). Interestingly, the two best amino acids in the P3′ position are basic (Arg and Lys), while two of the worst amino acids are acidic (Glu and Asp), along with Gly and Asn (Figured, correlation in Figure S7c). In the P4′ position, acidic residues (Glu and Asp) were again disfavored, whereas Val appeared enriched (Figuree, correlation in Figure S8c). Consistent patterns beyond the P4′ position were difficult to discern because of even smaller ranges in cleavage efficiency and increases in variation between the variable-long and variable-short libraries. Except for Cys in most positions, we observed high cleavage efficiencies in all fixed library subgroups, so we could not discern clear patterns beyond the fixed P1′ position (Figure S9). Cys was a consistent outlier in fixed library analyses, resulting in a lower cleavage efficiency when present at any position.

To help rationalize some of these observations, we examined models of the interaction of the enzyme with various peptide substrates. Starting with a published crystal structure of MAP (PDB: 2MAT),? the electrostatic potential surface as calculated by the adaptive Poisson–Boltzmann solver (APBS) ?,? revealed negatively charged surface at the periphery of the binding pocket (Figurea). We then used AlphaFold 3 through AlphaFold Server? to model enzyme–substrate interactions with a collection of hexapeptides. The complex with what should be a highly efficiently cleaved hexapeptide, MAWRVS, shown in Figureb, was modeled with high confidence (ipTM = 0.96, pTM = 0.97) and is consistent with several of the substrate preferences observed in the selection. Met fits into the confirmed binding pocket observed in several crystal structures, ?,? and the P1′ side chain has very little space, consistent with the preference for small P1′ residues. Interestingly, the P2′ Trp side chain efficiently fills a groove on the enzyme surface, consistent with the high cleavage activity of P2′ Trp. Since the positively charged P3′ Arg is oriented along the negatively charged surface of the enzyme, this structure suggests electrostatic attraction as a factor favoring Arg and Lys at this position. Conversely, electrostatic repulsion of negative side chains in P3′ Glu and Asp is consistent with the lower cleavage of these substrates. Among Glu and Asp, the longer side chain of Glu is better able to distance itself from the negatively charged binding pocket and is less detrimental to cleavage (Figure S10). Electrostatic repulsion is also consistent with the slower cleavage of substrates with negative P4′ residues, although this is less obvious from the structure.

(a) Electrostatic potential surface of the MAP enzyme as calculated from the published crystal structure (PDB: 2MAT) using PyMOL 1.8 with the APBS plugin. , (b) AlphaFold-predicted structure (ipTM = 0.96, pTM = 0.97) of MAP (colored by electrostatic potential calculation) in complex with short peptide MAWRVS (see Figure S10 for other peptides modeled).

Since the amino acid residue at the P1′ position has the strongest impact on cleavage efficiency, further subgroup analyses split libraries into various P1′ × P2′ combinations or P2′ × P3′ combinations with a constant P1′ position. For the P1′ × P2′ analysis, the percent cleaved for each combination was well correlated between variable-short and variable-long libraries (Figure S11a) and plotted on a heatmap (Figurea) sorted by average values across columns and rows. As expected from the previous P1′ subgroup analysis, tier 1 amino acids and tier 3 amino acids demonstrate universally high and low cleavage efficiencies, respectively, regardless of the amino acid in the P2′ position. From the heatmap, we can also observe that Ala and Trp in the P2′ position promote cleavage while Pro in the P2′ position suppresses it. These trends are primarily visible when looking at the impact these P2′ amino acids have on P1′ tier 2 amino acids in the center columns of the heatmap. Cys behaves differently from other amino acids with a consistently intermediate cleavage efficiency when in the P1′ position and, when in the P2′ position, inhibitory impact on sequences with P1′ tier 1 amino acids but promoting impact on sequences with P1′ tier 3 amino acids.

Heatmaps of percent cleavage calculated from subgroup analyses of two-residue combinations: P1′ × P2′ using the averaged variable libraries (a), P2′ × P3′ using averaged fixed libraries (fixed P1′ Ala) (b), and P2′ × P3′ using averaged variable libraries (fixed P1′ Asn, Ile, Gln, Glu, Leu, or Asp) (c). Columns and rows are sorted by average values.

Subgroup analyses with a combination of P2′ × P3′ with a constant P1′ position yielded some additional observations. The first analysis with the fixed libraries confirms the known phenomenon that cleavage works well with Ala in the P1′ for most sequences, although the data suggest again that Cys interferes with cleavage slightly at either the P2′ or P3′ position (Figureb, correlation in Figure S11b). Similar subgroup analyses using the portions of variable libraries filtered and analyzed separately for fixed Ser, Thr, and Ala in the P1′ position yielded similar results (Figure S12). For these P1′ tier 1 amino acids, sequences cleaved efficiently with any amino acid in the P2′ and P3′ positions except for Pro, Cys, His, and a few other amino acids that varied according to the specific amino acid in the P1′ position. To best discern the impact of amino acid residues at the P2′ and P3′ positions, an analogous heatmap was generated from sequences with a tier 2 P1′ amino acid (Asn, Ile, Gln, Asp, Leu, or Asp) (Figurec, correlation in Figure S11c). This analysis suggests that Trp and Ala are advantageous in the P2′ position, allowing for relatively efficient cleavage for all amino acids in the P3′ position, with basic amino acids Lys and Arg being slightly preferred. Pro, His, Ile, and Glu were generally disfavored in the P2′ position, and Asp, Glu, and Gly were disfavored in the P3′ position.

Comparison

to Literature

In general, conclusions from selection data agree with previously published literature studies studying the cleavage efficiency of Met and HPG by MAP (summarized in Table S5). The importance of side-chain size of the amino acid in the P1′ position is universally acknowledged with the previous literature concluding that smaller side chains favor cleavage and larger side chains disfavor cleavage. The exact same six P1′ tier 1 amino acids were identified by Hirel et al.,? and all other previous literature list some subset of these 6 amino acids. ?−? ?,?−? ? ? Hirel et al. as well as Kawakami et al.? report that Cys in the P1′ position favors efficient cleavage, further supporting the hypothesis that Cys was mildly incompatible with our assay due to interference with the click reaction. Considering the more subtle impact of amino acid residues in downstream positions, there is generally less agreement concerning the P2′ position, with only Frottin et al.? reporting that Trp favors cleavage and that both Pro and Glu disfavor cleavage. Frottin et al. also suggest that Cys in positions further downstream may disfavor cleavage efficiency. While some publications have reported a negative impact of acidic residues at the P3′ position, we are the first to detect a positive impact of basic residues Lys and Arg at that position.

Kinetic Assay

To validate the selection results, a collection of heptapeptides was prepared by Fmoc solid-phase synthesis (Scheme S1) and assayed for cleavage efficiency by an in vitro time-course kinetic experiment adapted from Merkel et al., using LC/MS to detect peptide cleavage.? Heptapeptide sequences were chosen with various P1′–P3′ amino acids to assay a range of fast-to-intermediate-cleaving sequences as predicted by the profiling experiment, including a handful of sequences tested with both HPG (X) and natural methionine (M) at the N-terminus. Cleavage reactions were run either with a low enzyme concentration and excess peptide substrate (0.43 μM MAP:180 μM peptide) for fast-cleaving sequences or with a high enzyme concentration similar to the mRNA display assay and peptide in equimolar amounts (15 μM MAP; 15 μM peptide) for intermediate-cleaving sequences. Reactions were run at RT, and at each time point, a small aliquot was removed and quenched with EDTA. Samples were then diluted with water containing HOBt as quantification standard and run on LC/MS with single ion recording. Percent cleavage was calculated from the standardized area under the curve relative to the initial measurement. All assays were performed in triplicate.

To visualize the data, percent cleavage values were plotted in Prism v10.6.1 (Figurea–c, full time course in Figure S17). Peptides starting with natural Met and assayed with low enzyme concentration could be split into 4 tiers: (1) MAT, MAW, and MSW, which were approximately fully cleaved by 10 min, (2) MAE, MGT, and MSE, which were approximately fully cleaved by 20 min, (3) MAP, which was approximately fully cleaved by 130 min, and (4) MGP and MEP, which did not demonstrate any observable cleavage within the time frame of the assay. Similarly, peptides with N-terminal HPG and assayed at low enzyme concentration could be split into 3 tiers: (1) XAW and XSW, which were mostly cleaved by 36 min, (2) XAT, XAE, and XSE, which were mostly cleaved by 130 min, and (3) XGP, XAP, and XEP, which did not demonstrate observable cleavage within the time frame of the assay. In all cases, peptides with N-terminal HPG demonstrated a lower cleavage efficiency than their analogous counterparts with N-terminal Met. A handful of additional peptides with N-terminal HPG as well as the three peptides that previously did not demonstrate observable cleavage were assayed with the high enzyme concentration and could be split into 4 tiers based on cleavage achieved by the final time point at 7 h: (1) XNWR and XAP, which were mostly cleaved by 7 h, (2) XNW and XNWD, which reached intermediate partial cleavage by 7 h, (3) XNE and XGP, for which only low levels of cleavage were observed, and (4) XEP, which did not demonstrate an observable cleavage within the time frame of the assay.

Percent cleavage observed for synthetic heptapeptides with N-terminal Met (a) or HPG (b,c) treated with MAP and quantified by LC/MS (triplicate data). Assays were run either with 0.43 μM MAP and excess peptide substrate (180 μM) (a,b) or with high MAP concentration (15 μM) equimolar with peptide (15 μM) (c). Peptides start with the sequence displayed (including N-terminal Met or HPG (=X)), followed by TYNK (except for XNWR and XNWD, which have R and D, respectively, at their P3′ position). Error bars represent standard deviation. Colors correspond to cleavage efficiency tiers as in Figure (blue = tier 1; orange/brown = tier 2; gray = tier 3).

Taken together, our kinetic study indicates the following cleavage efficiency trends at each given position: (1) P1: Met > HPG; P1′: Ala ≅ Ser > Gly ≫ Asn; P2′: Trp > *Thr

Glu ≫ Pro; P3′: Arg > Thr > Asp, with the caveat that MAT (but not XAT) cleaved more efficiently than analogous peptides with Trp in the P2′ position. These results are largely consistent with the profiling experiment. We hypothesize that the mRNA display profiling experiment, with its high MAP concentration greatly in excess of library, was not optimized to differentiate between the fastest-cleaving sequences with Ala/Ser versus Gly at the P1′ position. However, this allowed us to better discern trends in sequences with intermediate cleavage efficiency and more importantly to have the sensitivity to discern trends in further downstream positions such as the electrostatic component of substrate preference at the P3′ position.

Conclusion

In summary, we have utilized mRNA display and NGS to profile the substrate specificity of MAP with unnatural peptides containing N-terminal HPG. Analysis of cleavage data across various subgroups of peptides indicated a strong preference for cleavage with small and nonacidic residues in the P1′ position, a preference for Trp and Ala and aversion for Pro, His, and Ile in the P2′ position, a weak preference for basic residues and aversion for acidic residues in the P3′ position, potentially a weak aversion to acidic residues in the P4′ position, and no discernible trends at positions further downstream except for an aversion for Cys. Several of these observations could be rationalized by predicted enzyme–substrate complex modeling. To validate selection results, we chemically synthesized several heptapeptides and conducted time-course kinetic assays, the results of which largely agreed with selection data. It is important to note that while we designed the selection experiment to correct for variations in CuAAC biotinylation and pulldown efficiency, it is possible that overcorrection or undercorrection may affect the results, especially for cysteine-containing peptides where the correction was large. While our results correspond well with the previous literature using both N-terminal Met and HPG, the assay could potentially be redesigned with repeated rounds of selection to increase signal/noise, or with natural, biologically relevant Met using novel redox-based Met-specific bioconjugation.? These results may assist in the design of peptides and proteins with better control over the cleavage of N-terminal Met or HPG.

Supplementary Material

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Heinrich, T. ; Zenke, F. T. ; Bomke, J. ; Gunera, J. ; Wegener, A. ; Friese-Hamim, M. ; Hewitt, P. ; Musil, D. ; Rohdich, F. Chapter 4.2 - Methionine Aminopeptidases. In In Metalloenzymes; Supuran, C. T. , Donald, W. A. , Eds.; Academic Press, 2024; pp 343–373.
2Das B. C.Chokkalingam P.Shareef M. A.Shukla S.Das S.Saito M.Weiss L. M.Methionine Aminopeptidases: Potential Therapeutic Target for Microsporidia and Other Microbes J. Eukaryot. Microbiol.2024715 e 1303610.1111/jeu.1303639036929 PMC 11576263 · doi ↗ · pubmed ↗
3Majekodunmi T.Britton D.Montclare J. K.Engineered Proteins and Materials Utilizing Residue-Specific Noncanonical Amino Acid Incorporation Chem. Rev.2024124159113913510.1021/acs.chemrev.3c 0085539008623 PMC 11327963 · doi ↗ · pubmed ↗
4Ngo J. T.Tirrell D. A.Noncanonical Amino Acids in the Interrogation of Cellular Protein Synthesis Acc. Chem. Res.201144967768510.1021/ar 200144 y 21815659 PMC 3178009 · doi ↗ · pubmed ↗
5Bailey J. K.Horiya S.Neralkar M.Horvath V.Nakamoto K.Temme J. S.Turra R. J.Krauss I. J.In Vitro Selection of Cyclized, Glycosylated Peptide Antigens That Tightly Bind HIV High Mannose Patch Antibodies ACS Cent. Sci.202511112210.1021/acscentsci.5c 0053940726783 PMC 12291119 · doi ↗ · pubmed ↗
6Frottin F.Martinez A.Peynot P.Mitra S.Holz R. C.Giglione C.Meinnel T.The Proteomics of N-Terminal Methionine Cleavage Mol. Cell. Proteomics 20065122336234910.1074/mcp.M 600225-MCP 20016963780 · doi ↗ · pubmed ↗
7Xiao Q.Zhang F.Nacev B. A.Liu J. O.Pei D.Protein N-Terminal Processing: Substrate Specificity of Escherichia Coli and Human Methionine Aminopeptidases Biochemistry 201049265588559910.1021/bi 100546420521764 PMC 2906754 · doi ↗ · pubmed ↗
8Ben-Bassat A.Bauer K.Chang S. Y.Myambo K.Boosman A.Chang S.Processing of the Initiation Methionine from Proteins: Properties of the Escherichia Coli Methionine Aminopeptidase and Its Gene Structure J. Bacteriol.1987169275175710.1128/jb.169.2.751-757.19873027045 PMC 211843 · doi ↗ · pubmed ↗