Large Data Set Analysis Reveals Structural Origin of Peptide Collisional Cross Section Bimodal Behavior
Allyn M. Xu, Dániel Szöllősi, Helmut Grubmüller, Oded Regev

TL;DR
This study explains why some peptides have two distinct collisional cross-sectional area modes using machine learning and simulations.
Contribution
The paper identifies basic site positioning as a key factor in determining peptide CCS modes.
Findings
Peptides in the high CCS mode adopt extended, helical conformations.
Low CCS mode peptides tend to have compact, globular structures.
Protonation near the C-terminus supports helix formation in high CCS mode.
Abstract
Recent advances in ion mobility spectrometry have enabled the measurement of rotationally averaged collisional cross-sectional area (CCS) for millions of peptides as part of routine proteomic mass spectrometry workflows. One of the most striking findings in recent large ion mobility data sets is that CCS exhibits two distinct modes, most notably for charge 3+ peptides, with peptides predominantly exhibiting CCS in either the high or low mode. Here, using classical machine learning approaches, we identify that basic site positioning is a key sequence feature determining a peptide’s CCS mode. Molecular dynamics simulations suggest that peptides in the high CCS mode tend to adopt more extended conformations and form charge-stabilized helical structures, whereas those in the low CCS mode adopt more compact, globular conformations. Further supporting this structural hypothesis, we provide…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
1
2
3
4- —Division of Molecular and Cellular Biosciences10.13039/100000152
- —Simons Foundation10.13039/100000893
- —Max-Planck-Gesellschaft10.13039/501100004189
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMass Spectrometry Techniques and Applications · Advanced Proteomics Techniques and Applications · Protein Structure and Dynamics
Introduction
Ion mobility spectrometry (IMS) is a powerful, widely used technique for measuring the size and shape of gas-phase analytes with applications to many fields, including structural biology,? proteomics, ?−? ? and more.? In IMS, ions under the influence of an electric field are separated based on collisions with buffer gas, ?,? enabling measurement of the rotationally averaged collisional cross-sectional area (CCS), thereby providing insights into the conformation of analytes. IMS has been used in structural studies of biomolecules, complementing other methods such as crystallography, cryo-electron microscopy, or NMR spectroscopy. ?,?,? More recently, IMS paired with high-throughput mass spectrometry has enabled enhanced separation and analysis of complex analyte mixtures. ?−? ?
IMS has also been extensively used to study the conformations of polypeptides in the gas phase. ?,?,? Studying peptides in the gas phase can be informative for many reasons. The gas phase provides a simplified environment for studying intramolecular forces due to the absence of solvent interactions. ?,? Moreover, the gas phase may be a more representative environment than the solution phase for certain biological environments, such as in the interior of the cell membrane, which has a low dielectric constant of ϵ ≈ 3.? There are some notable differences in the formation of a secondary structure for peptides in the gas phase. Specifically, various studies, employing IMS combined with molecular dynamics (MD) simulations, have shown that α-helices more readily form when in alignment with the peptide’s charges. ?−? ? ? ? ? For example, the peptide ion Ac–Ala_ n –LysH^+^ (n = 10–20) displays a higher CCS consistent with an α-helix, whereas Ac–LysH^+^–Ala n _ displays a lower CCS consistent with a compact, globular structure. ?,? This difference is attributed to the fact that the C-terminal proton interacts favorably with the negative end of the α-helix’s macrodipole, which arises from the arrangement of the peptide backbone’s N–H and CO functional groups.? On the other hand, the N-terminal charge destabilizes the α-helix, and instead, the charge is solvated directly by the carbonyl groups. Similar studies have found evidence for other compact conformations, such as a folded-helix, depending on the position of charges. ?,? Despite these insights, these studies are restricted to peptides composed mostly of alanines, an amino acid with notably high helix propensity,? and there still remain many unanswered questions about gas-phase conformations of peptides in general.
Recently published large-scale data sets have provided a general outlook on CCS across millions of peptides. ?,? In these works, a single CCS is assigned to each peptide, corresponding to the most intense measured peak (even though some peptides can exhibit multiple stable conformations?). Interestingly, when CCS is plotted against mass, two distinct CCS modes appear, most prominently seen for charge 3^+^ peptides (FigureA). It was noted that this bimodality shares similarities with the extended and compact conformations observed in the previous polyalanine α-helix studies. ?,? However, it is still not fully understood how a peptide’s sequence influences its propensity to adopt the low or high CCS mode, or more generally, its CCS. In principle, this could depend on a complex interplay of structure, intramolecular forces, and the dynamics of gas-phase charges. ?−? ? ? ?
*CCS bimodality stratified by select peptide features. (A) 2D hexagonal binning plot of CCS versus mass for charge 3+ peptides in Meier et al.’s data set, with marginal densities for CCS and mass shown on right and top axes, respectively (left panel). Analysis workflow, including the same binning plot in the left panel, but colored red and blue for the low and high CCS mode, respectively (right panel). (B) 2D hexagonal binning plot of CCS versus mass for charge 3+ peptides with two basic sites (left), three basic sites (middle), and four basic sites (right). (C) 2D histogram of peptide length versus distance (in amino acids) of the internal basic site to the C-terminus for charge 3+ non-Nt-acetylated tryptic peptides with three basic sites, colored by the fraction of peptides in the bin with ΔCCS
0 (difference between CCS and CCSseparator; see the Methods section). Marginal density histograms on right and top axes, respectively (gray boxes). Cartoon of the peptide shown at the top, with the internal basic site colored in teal, and C-terminal arginine or lysine colored in blue. (D) 2D histogram of peptide length versus proline count, with coloring and marginal histograms as in panel (C), for peptides with an internal basic site with distance to C-terminus less than or equal to 5 (left), between 6 and 7 (middle), and greater than or equal to 8 (right). Bins with less than 8 peptides were excluded. (E) Backbone ribbon and side chains stick representation of MD simulation for representative three basic site peptides from the high and low CCS modes, respectively. Basic sites are shown in bold. (F) Plot of the fraction of MD simulation runs where a given residue was involved in an α-helix, for each of the two representative peptides.*
Here, we analyze a large-scale CCS data set? and perform MD simulations to investigate structural explanations for the observed bimodality. We find that peptides having at least two basic sites in the C-terminal region strongly favor the high CCS mode, whereas those with one or no basic sites in that region favor the low CCS mode. MD simulations demonstrate that in the former case, two C-terminal protonated basic sites form a cap stabilizing a α-helix, providing a structural explanation for the high CCS mode. MD simulations also show that in the latter case, peptides tend to form compact, globular conformations, explaining the low mode. To further support this helix formation hypothesis, we use interpretable machine learning models that reveal the position-dependent effects of amino acids.? These models provide evidence that residues near the C-terminal end are preferentially protonated, consistent with helix macrodipole effects. They also uncover (1) a preference in high CCS mode peptides for hydrophobic amino acids in the region expected to form a helix as well as (2) a helix-breaking effect of glycine in high (but not low) CCS mode peptides, an effect we confirm with additional MD simulations. All of these findings are physicochemically consistent with the helix formation hypothesis. The insights provided here into peptide gas-phase conformations and their sequence determinants should inform future ion mobility-based proteomics pipelines.
Methods
Data Collection
A postprocessed CCS data set was obtained from Meier et al.? As detailed in their work, peptides were generated from a variety of organisms (C. elegans, D. melanogaster, E. coli, H. sapiens, and S. cerevisiae) using three proteases (trypsin, LysC, and LysN), processed through fractionated LC-TIMS-MS/MS runs on a TIMS-quadrupole TOF mass spectrometer, and analyzed using MaxQuant software. For each peptide and charge state, a single CCS was selected per LC-TIMS-MS run based on the most abundant identified feature. Finally, each peptide and charge state was assigned a unique CCS value, obtained from aligning and averaging across LC-TIMS-MS runs.
The data set contains CCS values for 559,979 unique combinations of peptides and charge states. For this study, we focused on the 143,850 peptides with charge 3^+^. The peptide lengths range from 8 to 55 amino acids with a median of 19.
Separating Peptides Based on Low and High CCS Modes
To identify the low and high modes of the CCS bimodality (FigureA), we first fit a Gaussian mixture model (n = 2) on the 2D data set of CCS versus mass. Classifying the points based on the Gaussian mixture model, we observed that the separator was nearly linear. For simplicity, we then fit a linear separator on the Gaussian mixture model classification using squared hinge loss (regularization parameter = 1.0). The resulting linear separator is given by
where CCS is given in Å^2^ and mass in Da.
For each peptide, we computed the difference between its CCS and CCS_separator_, referred to as ΔCCS. Lastly, a peptide was denoted as belonging to the low or high CCS mode if ΔCCS ≤0 or ΔCCS >0, respectively.
Predicting the CCS Mode and Absolute CCS Using Generalized Additive
Models
We trained generalized additive models (GAMs)? to investigate the sequence determinants of both the CCS mode and absolute CCS. Specifically, we trained four GAMs: two logistic GAMs to predict CCS mode and two linear GAMs to predict absolute CCS. The input to all models was the peptide sequence (as a string of amino acids), and the dependent variable was either the CCS mode (coded as 0 for the low mode and 1 for the high mode) or absolute CCS (in Å^2^).
For the purpose of GAM analysis, we considered only unmodified tryptic peptides with charge 3^+^, exactly three basic sites, and a minimum length of 15 residues from Meier et al.’s data set.? From this filtered set, we defined four training subsets. For the CCS mode prediction, one subset consists of peptides whose internal basic site is within five residues of the C-terminus and another consists of peptides whose internal basic site is at least seven residues away; a separate model was trained on each. For absolute CCS prediction, the first subset was restricted to high-mode peptides, and the second subset was restricted to low-mode peptides; again, a separate model was trained on each.
GAM predictions are based on the total contribution of (1) the identity and relative position of each non-C-terminal amino acid in the peptide sequence, (2) the peptide length in residues, and (3) the identity of the C-terminal amino acid (R or K).
We designed a specialized “shared spline” architecture to enable the GAMs to operate on sequential inputs and capture position-specific contributions of amino acids. Each of the 20 standard amino acids is assigned a learned spline function (denoted as f _ a _ for amino acid a). For a peptide sequence a 1 a 2···a _ n _, the contribution of the ith non-C-terminal amino acid a _ i _ is given by its corresponding spline function f _ a _ i _ _, evaluated at the amino acid’s relative position within the peptide. This relative position is given by , ranging linearly from 0 (N-terminal) to 1 (penultimate residue). In other words, the spline f _ a _ encodes the contribution that amino acid a as a function of its relative position.
The contribution of peptide length is captured by a learned spline f len, and the contribution of the C-terminal amino acid is given by an indicator variable equal to 0 if the residue is arginine (R), and 1 if lysine (K)multiplied by a learned coefficient β_C‑term. K_.
Together, the total contribution of a peptide input sequence a 1 a 2···a _ n _ is
For CCS mode GAMs, the output is the standard logistic function of the total contribution above, interpreted as the predicted probability that the peptide belongs to the high CCS mode. For the absolute CCS GAMs, the total contribution is interpreted directly as the predicted CCS value (in Å^2^).
The models were implemented in Python using the pyGAM library,? with custom modifications to support the shared spline architecture. Hyperparametersincluding the number of splines and smoothing penalties on derivative and magnitudewere tuned to balance the model fit and spline smoothness. Code and trained models are linked in the “Code availability” section.
Molecular Dynamics Simulations
In order to sample the conformational space of the peptide sequences in question, we performed fully atomistic in vacuo MD temperature quenching simulations. Two sets of sequences were tested (P1: GGGNQVSLLNVVMDLKK and P2: GGGLAHVGFDNATFLSER), each with the position of the glycine triplet progressively shifted toward the C-terminus, resulting in 14 and 15 peptides tested, respectively. Fully extended conformations were generated using Pymol (ver. 3.0 Schrödinger, LLC.) to ensure unbiased starting structures.
All peptides carried a charge of 3^+^, assigned to the basic amino acids and the N-terminus. Acidic amino acids were protonated and thus neutral. After topology generation, the peptides were placed in a cubic box with a side length of 100 nm. Energy minimization was performed using the steepest descent integrator with 3000 steps, followed by a three-step equilibration procedure with increasing time steps of 0.1 fs, 0.5, and 1 fs with position restraints and under an NVT ensemble. The resulting structures served as a starting point for the quenching simulations.
A simulated annealing temperature quench was applied to allow the peptides to reach an equilibrium conformation. Specifically, the initial temperature was set to 600 K and maintained for 10 ns, then linearly decreased to 305 K over a time period of 0.5 μs, and subsequently kept at 305 K for another 40 ns to facilitate thermal equilibration.
The net charge of the system and the absence of solvent required a specific set of simulation parameters: (i) GROMACS (version 2023.4)? was compiled in double precision and run with a shorter than usual 1 fs integration time step; (ii) cutoff distances were set to 30 nm, such that all peptide atoms interacted with all other atoms explicitly via Coulomb and Lennard-Jones forces at all times, allowing a long neighbor search interval (1 ns) and therefore enhancing performance; (iii) the particle mesh Ewald method was not used, as it would fail given the total net charge of the system; and (iv) pressure coupling was disabled. The system was periodic, and the center of mass was kept in the center of the simulation box. The temperature was controlled by the V-rescale algorithm.?
All simulations were performed with the CHARMM36m force field.? The quenching procedure was repeated 1000 times for each peptide, with starting velocities randomly drawn from a Boltzmann distribution separately for each run. All atomic positions were saved every 10 ns.
Results and Discussion
CCS Bimodality of Peptides with Three Basic Sites Is Largely
Determined by the Internal Basic Site Position
Charge 3^+^ peptides in Meier et al.’s data set display a bimodal distribution when CCS is plotted versus mass (FigureA). We therefore separated the peptides based on whether they exhibit CCS in the low or high mode (see Methods). We found that approximately 60% of these peptides belong to the low mode, while 40% belong to the high mode, and this bimodality persists even when considering peptides with a specific number of basic sites (arginine, lysine, histidine, or N-terminal amine group) (FigureB). These two modes are largely distinct and well-separated, except in the lower mass ranges. To ensure that only peptides with well-defined modes were considered, we limited our analysis to those with 15 or more amino acids. Furthermore, we restricted our analysis to tryptic and LysN peptides that were not N-terminal acetylated (Nt-acetylated), which made up the majority (98%) of the data set (see Methods for details). For these 120,466 peptides (mass range: 1370–4600 Da), we investigated the sequence features that explain their CCS bimodality.
In this first section, we focus on the 60,396 peptides with exactly three basic sites because, for such peptides, the 3^+^ charge likely localizes to the three basic sites. We note that the tryptic peptides among these have one N-terminal basic site, one C-terminal basic site, and one internal basic site. In contrast, the LysN peptides have two fixed basic sites at the N-terminus.
Interestingly, for tryptic peptides, we observed a distinct transition between the two CCS modes, depending on the position of the internal basic site (FigureC). Specifically, among tryptic peptides with an internal basic site located near the C-terminus (within five amino acids), 68% exhibited CCS in the high mode, whereas among tryptic peptides with an internal basic site farther from the C-terminus (at least seven amino acids), 79% exhibited CCS in the low mode. This trend was most pronounced in peptides between 15 and 25 amino acids in length, while it was diminished in peptides longer than 25 amino acids. With their basic sites located further toward the N-terminus, LysN peptides extended this trend, showing a similar percentage of 81% in the low mode when the (sole) internal basic site is near the C-terminus, and 99% otherwise (Supplementary Figure S1). Together, these findings demonstrate that basic sites in the proximity of the C-terminus promote the high CCS modemarkedly so when two such sites are present.
One explanation for these observations is that protonated basic sites located near the C-terminus may induce charge-stabilized helical-like structures, leading to high CCS values. Specifically, such sites can interact favorably with backbone carbonyl groups that, in a helical conformation, are exposed toward the C-terminal end. Similar structures have been observed in MD simulations, suggesting their viability as candidate conformers.? In contrast, protonated basic sites located farther from the C-terminus can destabilize helical structures due to unfavorable electrostatic interactions with the helix macrodipole. This can result in more compact, globular conformations that exhibit low CCS values.?
This reasoning aligns with that presented by Counterman and Clemmer,? whose MD simulations showed that charge 3^+^ polyalanines favor extended helical conformations when charges are predominantly located in the C-terminal half and compact conformations when charges are predominantly located in the N-terminal half. Our observations suggest that similar structural principles apply much more broadly to a wide range of peptides. We note that Counterman et al. focused on the relative position of charges along the peptide (i.e., as a fraction of peptide length), whereas our findings suggest that it is the absolute position (specifically, having two basic amino acids within 5–7 residues of the C-terminus) that plays a more critical role. Indeed, we observed little evidence that relative position matters beyond absolute position (FigureC, Supplementary Figure S1).
To further support the hypothesis that the high CCS mode largely consists of charge-stabilized helical-like structures, we evaluated the effect of proline and lengthtwo features that we expected to disrupt such conformations. Namely, proline is known to break helical structures,? while longer peptides exhibit greater conformational freedom, which we reasoned may make it more difficult to maintain stable helical conformations. Indeed, among tryptic peptides with three basic sites, we observed that the fraction of peptides in the high CCS mode decreased with both an increasing proline count and an increasing peptide length, even when stratified by the internal basic site position (FigureD). These results are consistent with the interpretation that the high CCS mode corresponds to extended, helix-like conformers.
To provide additional support for the helical hypothesis, we performed MD simulations on two charge 3^+^ peptides with three basic sites: one from the high CCS mode with the internal basic site adjacent to the C-terminal lysine (GGGNQVSLLNVVMDLKK; CCS = 515 Å^2^, ΔCCS = 6.7 Å^2^) and the other from the low CCS mode with the internal basic site closer to the N-terminus (LAGGGHVGFDNATFLSER; CCS = 490.7 Å^2^, ΔCCS = – 27.9 Å^2^). In the simulations, the high-mode peptide frequently formed a helical conformation (FiguresE, top; ?F, left; Supplementary S5A), with the helix capped at the C-terminal end by two lysine charges. These two lysine charges were solvated by the backbone carbonyl groups of the helix and of the capping structure (FigureE, right). This structural arrangement supports the hypothesis that extended helix-like conformations can be stabilized by electrostatic interactions between C-terminally localized charges and the helix macrodipole. In contrast, the low-mode peptide adopted compact, globular conformations with a minimal secondary structure (FiguresE, bottom; ?F, right; Supplementary S5B), consistent with the expectation that basic sites located away from the C-terminus destabilize helices.
In summary, our results show that peptides with three basic sites tend to adopt a high CCS mode when two of those sites are located near the C-terminus. This supports a model in which gas-phase peptide conformation reflects a balance between helix-promoting and helix-disrupting features, with C-terminal charge localization favoring extended, helical-like structures. The additional effects of proline and peptide length, together with simulation-based evidence of charge-stabilized helices, further reinforce this framework. Our findings align with prior studies on polyalanine and suggest that charge-stabilized helical conformations may be a general structural motif across a wide range of peptide sequences.
Position of the Second But Not the First Internal Basic Site
Affects the CCS Mode of Peptides with Four Basic Sites
To further elucidate the CCS bimodality, we next examined charge 3^+^ peptides with more than three basic sites. Specifically, we examined non-Nt-acetylated tryptic peptides with four basic sites, again restricting our analysis to peptides that are at least 15 amino acids long. These peptides contain an amine group at the N-terminus, an arginine or lysine at the C-terminus, and two internal basic sites whose positions vary across the peptides. Note that with four basic sites, the localization of the 3^+^ net charge is clearly ambiguous.
Interestingly, we found that the CCS mode strongly correlated with the position of the second internal basic site and is largely independent of the position of the first. Specifically, peptides tended to adopt the high CCS mode when the second internal basic site was located within 8–10 residues of the C-terminus and the low CCS mode otherwise (FigureA). One explanation for this preference is that an internal basic site near the C-terminus enables a configuration in which two of the three protons can localize toward the C-terminal enda distribution we hypothesized promotes the high CCS mode by facilitating charge-stabilized helix formation. This would explain why the high CCS mode is largely observed only when the second internal basic site is positioned sufficiently close to the C-terminus. In contrast, when no internal basic site is positioned near the C-terminus, such a charge distribution might not be possible. This may explain why these peptides predominantly fall into the low CCS mode.
CCS bimodality in peptides with four basic sites. 2D histogram of the distance of the first versus second internal basic site from the C-terminus for charge 3+ non-Nt-acetylated tryptic peptides with four basic sites, colored by the fraction of peptides in the bin with ΔCCS >0. Bins with less than 8 peptides were excluded. Marginal density histograms on right and top axes, respectively (gray boxes). (A) All analyzed peptides. (B) Peptides whose internal basic sites are an arginine or lysine, followed by a histidine. Cartoon of the peptide shown at the top, with internal basic sites colored in teal, and C-terminal arginine or lysine colored in blue.
Interestingly, we observed that this positional effect persists even when the first internal basic site is an arginine or a lysine and the second is a histidine (FigureB). The side chains of arginine and lysine are more basic than that of histidine, whose gas-phase basicity is generally comparable to that of the N-terminal amine. ?,? Given that the four basic sites compete for three protons, one would not expect histidine to be a favored site of protonation in such peptides. However, our findings suggest that histidine is frequently protonated when it is positioned near the C-terminal end. This is consistent with the idea that charge localization is influenced not only by the basicity of the side chains but also by their spatial alignment with structural featuresparticularly the macrodipole associated with a helical-like conformation. Indeed, prior studies have shown that macrodipole effects can shift the apparent basicity or acidity of functional groups along the peptide backbone, favoring charge retention near the C-terminus.?
Furthermore, a similar pattern emerged, even in peptides with more than four basic sites. Specifically, in charge 3^+^ tryptic peptides with five basic sites (N-terminal amine, three internal basic residues, and C-terminal R/K), the CCS mode largely depended on whether the last internal basic site was positioned near the C-terminusregardless of the positions of the other two internal basic sites (Supplementary Figure S2). Consistent with the reasoning above, this supports the notion that a helix may form when the protonation pattern enables stabilization by C-terminal charges and, conversely, that a C-terminal basic site is preferentially protonated when it aligns with a helical conformation, even when multiple sites compete for charges.
Based on these findings, we speculate that helix-like conformations and the associated macrodipole interactions may play a key role in guiding charge localization in the gas phase. Moreover, our results point to a feedback relationship: charge distribution can influence peptide structure, and structural featuressuch as alignment with the macrodipolecan, in turn, influence where charges are stabilized. This interplay offers a plausible mechanistic explanation for the observed CCS bimodality, in which peptides with basic sites clustered near the C-terminus tend to favor extended, helix-like conformations.
Sequence Features Influencing the CCS Mode
Despite the overall importance of internal basic site positioning in determining the CCS mode, other sequence features also have an effect. Indeed, even when the internal basic site is near the C-terminus, some peptides are in the low mode, and conversely, even when the internal basic site is far from the C-terminus, some peptides are in the high mode (FigureC). To systematically investigate these effects, we trained GAMs? to predict the CCS mode (see Methods). Each GAM prediction is modeled as the sum of the position-dependent contributions from individual residues. Specifically, the model includes one learned spline for each of the 20 amino acids, capturing how residue contributions vary with their relative positions within peptides. This design allows us to capture potential amino acid effects that are dependent on their sequence position. Additionally, we included a spline function to model peptide length effects and a categorical variable to account for the identity of the C-terminal amino acid (R or K).
We focused our analysis on unmodified tryptic peptides with three basic sites and a minimum length of 15 residues, similar to the set analyzed in the first results section. Given that the internal basic site’s position strongly influenced the CCS mode (FigureC), we stratified peptides into two subgroups: those where the internal site was within five residues of the C-terminus and those where it was more than seven residues away. We then trained a separate GAM for each subgroup, achieving high prediction accuracy (AUROC = 0.883 and 0.840, respectively).
First, we analyzed the learned spline functions for peptides in the subgroup where the internal basic site was near the C-terminus (FigureA, red). As described above, we hypothesize that peptides in this subgroup tend to adopt extended α-helical structures stabilized by a pair of charges near the C-terminus (Figure). Interestingly, the amino acid splines exhibit position-dependent contributions that are generally more pronounced in a broad central region and slightly skewed toward the C-terminus (FigureA). These patterns suggest that this region plays a structurally relevant role in determining the CCS mode, potentially indicative of structural elements such as α-helices.
Logistic GAM models predicting the CCS mode. (A,B) GAM models trained to predict the CCS mode. Shaded regions indicate 95% confidence intervals. Each amino acid (except the C-terminal one) contributes additively to the final prediction on the logit scale, depending on its relative position along the peptide, as shown in the corresponding panel (A). The additive logit contribution of length and of the C-terminal amino acid are shown in (B). C† refers to carbamidomethylated cysteine; R, K* refer to non-C-terminal arginine and lysine, respectively. (C) The contribution of each amino acid, averaged over all positions and over the two modes, as a function of its Kyte-Doolittle hydropathy. Carbamidomethylated cysteine is excluded.*
Notably, amino acids with nonaromatic hydrocarbon side chains (A, V, I, L) exhibit similar spline patterns, generally favoring the high CCS mode when located in this central region (FigureA). In line with our hypothesis, hydrocarbon side chains may be less disruptive to helix formation in the gas phase due to their nonpolar nature, which may explain their observed propensity to favor the high CCS mode. Indeed, valine has been shown to exhibit high α-helix propensity in the gas phase, in contrast to its behavior in an aqueous solution.? In fact, more broadly, spline contributions generally correlate with amino acid hydrophobicity (FigureC): hydrophobic residues promote the high CCS mode, whereas hydrophilic residues such as Q, N, E, and D tend to decrease this likelihood. While hydrophobicity is not a direct proxy for gas-phase polarity, the observed correlation is consistent with the notion that residues with polar side chainsoften classified as hydrophilicmay hinder α-helix formation in the gas phase through electrostatic disruption.? In addition, proline exhibited the most pronounced spline pattern, substantially reducing the likelihood of adopting the high CCS mode when it is centrally located in the peptide (FigureA). This aligns closely with its established role as a helix breaker? and is in agreement with our earlier observations (FigureC). Lastly, peptides terminating in lysine favored the high CCS mode more strongly than those ending in arginine (FigureC). This may result from lysine’s smaller, more flexible side chainpotentially better suited to helix-cappingthan the bulkier, planar guanidinium group of arginine. Collectively, these observations support the physicochemical relevance of the GAM outputs and align well with the hypothesis of charge-stabilized α-helical conformations.
Second, we examined the learned spline functions for peptides whose internal basic site was far from the C-terminus (FigureA, blue). Notably, their spline patterns closely resemble those of the previous subgroupwhere the internal basic site was near the C-terminussuggesting that peptides in this subgroup may also adopt similar charge-stabilized helical conformations. We speculate that these conformations might involve two positive charges near the C-terminus (even though there is only one basic site there), with either the internal basic site or the N-terminal amine being unprotonated. Although speculative, this scenario is plausible: our results above suggest that charge localization near the C-terminus may be favorable due to potential alignment with an α-helix macrodipole (Figure). Consistent with this idea, one of the strongest contributions for promoting the high CCS mode in this peptide subgroup occurs when the internal basic site is at the N-terminus and is a histidine (but much less so when it is an arginine or a lysine) (FiguresA; Supplementary S3). One plausible explanation for this pattern is that strong Coulombic repulsion at the N-terminus, combined with histidine having lower basicity compared to arginine and lysine, could make double protonation of the N-terminal amine and histidine less favorable energetically. This would promote localization of one of those charges near the C-terminal end, again potentially stabilizing an extended helix-like conformation. In summary, the formation of charge-stabilized helices provides a coherent explanation for the high CCS mode in this subgroup.
Together, our GAM analysis reveals sequence features consistent with the hypothesis that extended, charge-stabilized helical conformations underlie the high CCS mode. The position-specific influences of amino acid residuesespecially the propensity of hydrophobic residues to promote helices, the destabilizing effects of helix-breaking residues like proline, and the differential effectiveness of lysine and arginine in C-terminal helix cappingall lend strong plausibility to this structural model.
Sequence Features Influencing Absolute CCS within Each Mode
Having examined how sequence features influence the CCS mode, we next investigated how they determine CCS within each mode. As mentioned before, we used the set of unmodified tryptic peptides with exactly three basic sites and lengths of at least 15 residues. We then divided these peptides into high and low CCS mode subgroups, training a linear GAM on each to predict absolute CCS (see the Methods). These models allowed us to examine how specific residues and sequence patterns contribute to shifts in absolute CCS, rather than transitions between distinct classes of conformational states.
The GAMs capture intuitive relationships: CCS increases with peptide length in both subgroups (FigureB), consistent with the physical intuition that longer peptides generally occupy more space. Additionally, the average value of the amino acid splines correlates strongly with residue mass (R ^2^ = 0.83; FigureC), reiterating the mass-CCS correlation observed earlier (FigureA).
Linear GAM models predicting absolute CCS. (A,B) GAM models trained to predict absolute CCS. Shaded regions indicate 95% confidence intervals. Each amino acid (except the C-terminal one) contributes additively to the final prediction depending on its relative position along the peptide, as shown in the corresponding panel (A). The additive contribution of length and of the C-terminal amino acid are shown in (B). C† refers to carbamidomethylated cysteine; R, K* refer to non-C-terminal arginine and lysine, respectively. (C) The contribution of each amino acid, averaged over all positions and over the two modes, as a function of its residue mass. (D,E) MD simulations of the high mode series (D) and the low mode series (E). Also shown are representative conformations. Glycine triplet is highlighted in yellow.*
Although the learned splines from both GAMs exhibit similar average values (Pearson’s r = 0.98 across amino acids), their position-dependent patterns differ notably. In particular, the amino acid splines for the low-mode subgroup are generally flat (FigureA, blue); that is, each amino acid residue contributes similarly to CCS regardless of their position. This aligns with the notion that the low-mode subgroup primarily comprises random, globular conformations, where residues influence CCS predominantly through their size rather than their specific positions. In contrast, amino acid splines for the high-mode subgroup show positional dependence (FigureA, red). For instance, tryptophan increases CCS by ∼7 Å^2^ when located closer to the peptide termini compared with a central position. A possible explanation is that in α-helical conformations, tryptophan’s bulky side chain is more exposed near the peptide termini, whereas the helix may shield it when it is positioned more centrally.
Surprisingly, glycine also exhibits pronounced position dependence: in the high-mode subgroup, glycine increased CCS by ∼10 Å^2^ when centrally located, compared to positions near the peptide termini (FigureB, red). To further investigate this positional effect of glycine, we turned to MD simulations. We selected a peptide from the high-mode subgroup containing a glycine triplet (GGGNQVSLLNVVMDLKK), then generated a series of peptides differing only by the position of this glycine triplet by sliding it across consecutive positions. We then ran simulations for each peptide in this series, computing the solvent accessible surface area (SASA) as a proxy for CCS (see the Methods). We observed that the peptides with the glycine triplet located in the center of the peptide indeed tended to exhibit larger conformations than those with the triplet located at the ends (FigureD). Examining representative conformations, we found that when the glycine triplet was located near the termini, peptides formed α-helical conformations capped by the double lysine at the C-terminus (FiguresD, lower left and right diagrams; ?E). However, when the glycine triplet was located in the center, the peptides tended to form elongated, extended conformations, with low α-helical content (FiguresD, upper diagrams; Supplementary S4A, S5A). These simulations indicate that glycine may differentially influence extended conformers by introducing local flexibility that perturbs a helical structure when positioned at the center of the peptide. Interestingly, the presence of a glycine triplet toward the center of the peptide also induced larger variance in the simulated SASA values due to the presence of globular conformations with lower SASA values (FigureD, center diagram). This aligns with the notion that glycine can also destabilize extended conformations and promote compact conformers.
Lastly, we verified the observed glycine spline for the low CCS mode by repeating the MD simulations but for a different series generated using a peptide with a glycine triplet from the low-mode subgroup (LAGGGHVGFDNATFLSER). Indeed, we observed that the simulated SASA values did not vary strongly with the position of the glycine triplet (FigureE) and that the α-helical content was consistently low (Supplementary Figures S4B, S5B). Moreover, we observed that representative conformations for these peptides were compact globular peptides, consistent with our hypothesis (FigureE, diagrams).
Taken together, these results highlight how the amino acid identity and sequence position jointly influence peptide CCS within conformational modes. Our GAM and MD analyses underscore how subtle positional effects can significantly modulate the peptide structure.
Summary and Conclusion
In this study, we investigated the sequence determinants underlying the CCS bimodality in charge 3^+^ peptides. We found that the CCS mode is strongly influenced by basic site positioning: peptides with basic sites distributed toward the C-terminus strongly favor the high mode. Together with our MD simulations, these findings support the hypothesis that such peptides often adopt extended helix-containing conformations. This extends previous observations from polyalanine systems and suggests that α-helical structures are prevalent in the gas phase across a broad range of peptides. We also found evidence consistent with the hypothesis that residues near the C-terminus are preferentially protonated due to helix macrodipole effects. These insights may inform MD simulations of the gas-phase proton dynamics. Lastly, our GAM analyses revealed additional sequence features associated with the CCS mode and absolute CCS. These sequence features correlate with the physicochemical properties of peptides and offer additional insights into the conformational landscape underlying peptide CCS.
Our study has a few limitations. First, the peptide ions in the large-scale data set of Meier et al.? were each assigned the strongest CCS peak, even though some peptides can exhibit other secondary CCS peaks.? However, peak CCS has been demonstrated to be extremely consistent (an accuracy of <1%),? suggesting that it provides a robust measure for a peptide’s preferred gas-phase conformation. Second, our analysis is influenced by the distribution of peptides in the selected data set. Our use of a data set spanning the proteomes of multiple organisms partly mitigates this issue. This can be further addressed by including peptides generated with more enzymes or synthetic peptides. Third, because to our knowledge no biomolecular force field for in vacuo simulations is available,? we used a force field that has been parametrized using experimental data and MD simulations in solution. As a result, our simulations may be less accurate than those in solution; based on previous experience,? however, we think they are sufficiently accurate to support the qualitative conclusions presented here.
In conclusion, our analysis has demonstrated sequence determinants that can largely explain the CCS bimodality for a wide range of peptides. These findings provide insights that inform our understanding of the gas-phase conformations of the peptides and the dynamics of their charges.
Supplementary Material
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Lanucara F.Holman S. W.Gray C. J.Eyers C. E.The power of ion mobility-mass spectrometry for structural characterization and the study of conformational dynamics Nat. Chem.2014628129410.1038/nchem.188924651194 · doi ↗ · pubmed ↗
- 2Baker E. S.Livesay E. A.Orton D. J.Moore R. J.Danielson W. F.Prior D. C.Ibrahim Y. M.La Marche B. L.Mayampurath A. M.Schepmoes A. A.Hopkins D. F.Tang K.Smith R. D.Belov M. E.An LC-IMS-MS platform providing increased dynamic range for high-throughput proteomic studies J. Proteome Res.20109997100610.1021/pr 900888 b 20000344 PMC 2819092 · doi ↗ · pubmed ↗
- 3Zhong Y.Hyung S.-J.Ruotolo B. T.Ion mobility-mass spectrometry for structural proteomics Expert Rev. Proteomics 20129475810.1586/epr.11.7522292823 PMC 3313071 · doi ↗ · pubmed ↗
- 4Meier F.Beck S.Grassl N.Lubeck M.Park M. A.Raether O.Mann M.Parallel accumulation-serial fragmentation (PASEF): Multiplying sequencing speed and sensitivity by synchronized scans in a trapped ion mobility device J. Proteome Res.2015145378538710.1021/acs.jproteome.5b 0093226538118 · doi ↗ · pubmed ↗
- 5Mäkinen M. A.Anttalainen O. A.SillanpääM. E. T.Ion mobility spectrometry and its applications in detection of chemical warfare agents Anal. Chem.2010829594960010.1021/ac 100931 n 20979419 · doi ↗ · pubmed ↗
- 6Kanu A. B.Dwivedi P.Tam M.Matz L.Hill Jr H. H.Ion mobility-mass spectrometry J. Mass Spectrom.20084312210.1002/jms.138318200615 · doi ↗ · pubmed ↗
- 7Dodds J. N.Baker E. S.Ion mobility spectrometry: Fundamental concepts, instrumentation, applications, and the road ahead J. Am. Soc. Mass Spectrom.2019302185219510.1007/s 13361-019-02288-231493234 PMC 6832852 · doi ↗ · pubmed ↗
- 8Artigues, A. ; Nadeau, O. W. ; Rimmer, M. A. ; Villar, M. T. ; Du, X. ; Fenton, A. W. ; Carlson, G. M. Modern Proteomics – Sample Preparation, Analysis and Practical Applications, Marcus, K. , Eisenacher, M. , Sitek, B. , Eds.; Springer: 2016; Vol. 919, pp 397–431.10.1007/978-3-319-41448-5_19PMC 527159927975228 · doi ↗ · pubmed ↗
