Complete end-to-end learning from protein feature representation to protein interactome inference
Yu-Hsin Chen, Chien-Fu Liu, Jun-Yi Leu, Huai-Kuang Tsai

TL;DR
FREEPII is a deep learning framework that improves the accuracy and scalability of predicting protein interactions using mass spectrometry data and sequence features.
Contribution
FREEPII introduces an end-to-end deep learning model that integrates CF-MS data and sequence features for robust and generalizable protein interaction inference.
Findings
FREEPII outperforms existing tools in capturing biologically coherent and discriminative protein features.
Multimodal data integration improves model generalization and sensitivity across datasets and species.
Supervised embeddings capture higher-order interaction contexts for reliable discovery of novel interactions.
Abstract
Co-fractionation coupled with mass spectrometry (CF-MS) is a powerful strategy for mapping protein–protein interactions (PPIs) under near-physiological conditions. Despite recent progress, existing analysis pipelines remain constrained by reliance on handcrafted features, sensitivity to experimental noise, and an inherent focus on pairwise interactions, which limit their scalability and generalizability. To address these difficulties, we introduce FREEPII (Feature Representation Enhancement End-to-End Protein Interaction Inference), a unified deep learning framework that integrates CF-MS data with sequence-derived features to learn biologically meaningful protein-level representations for accurate and efficient inference of PPIs and protein complexes. FREEPII employs a convolutional neural network architecture to learn protein-level representations directly from raw data, enabling…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5| Complex | Data | Model | % of overlap in complex | % of overlap in cluster | Jaccard similarity |
|---|---|---|---|---|---|
| Mediator | H1 | FREEPII |
|
|
|
| SPIFFED | 12.50 | 22.22 | 0.087 | ||
| EPIC | 12.50 | 5.00 | 0.037 | ||
| H2 | FREEPII |
|
|
| |
| SPIFFED | 13.04 | 13.64 | 0.071 | ||
| EPIC | 13.04 | 4.48 | 0.034 | ||
| Cytoplasmic ribosomal large subunit | Y1 | FREEPII |
|
|
|
| SPIFFED | 62.50 | 36.76 | 0.301 | ||
| EPIC | 70.00 | 54.90 | 0.444 | ||
| Y2 | FREEPII |
| 75.93 |
| |
| SPIFFED | 41.46 |
| 0.378 | ||
| EPIC | 53.66 | 70.97 | 0.440 |
| Species | Accession | Experimental name | Abbreviation | Number of fractionations | Number of proteins | Number of proteins in protein complexes |
|---|---|---|---|---|---|---|
| Human | PXD002892 | SEC2_H | H1 | 55 | 4,002 | 1,720 |
| Human | PXD002892 | SEC3_H | H2 | 55 | 4,563 | 1,909 |
| Human | PXD014820 | Ctrl | H3 | 61 | 5,268 | 1,986 |
| Human | PXD015406 | Control | H4 | 61 | 6,043 | 2,225 |
| Yeast | PXD031967 | Hsp90_20200122_H | Y1 | 27 | 2,397 | 1,089 |
| Yeast | PXD031967 | Ctrl_20200416_H | Y2 | 27 | 2,753 | 1,218 |
| Yeast | PXD031967 | Ctrl_20191126_H | Y3 | 27 | 2,026 | 933 |
| Yeast | PXD031967 | Hsp90_20201028_H | Y4 | 27 | 2,952 | 1,309 |
- —Academia Sinica10.13039/501100001869
- —National Science and Technology Council10.13039/100020595
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Machine Learning in Bioinformatics · Protein Structure and Dynamics
Background
Proteins play a central role in biological processes such as catalytic reactions, signal transduction, immune responses, and molecule transportation [1–3]. These biological activities are often executed or regulated through protein–protein interactions (PPIs), forming a complex network called the protein interactome. Deciphering the local structure of the protein interactome (protein complexes), as well as completing the global structure of the protein interactome, is critical for understanding cellular functions and disease mechanisms [4–6]. Several techniques have been used to analyze transient and stable PPIs and protein complexes, including yeast 2-hybrid (Y2H) screens [7–9], affinity purification coupled to tandem mass spectrometry (AP-MS) [10, 11], and protein co-fractionation coupled to mass spectrometry (CF-MS) [12–16].
To date, the results of Y2H and AP-MS analysis have been used to establish approximately 53,000 [8] and over 56,000 [6] protein–protein interactions in human cells, respectively. Nonetheless, both systems have their limitations. The Y2H screen assay can only detect binary interactions in 1 experiment and requires the target proteins to be expressed in yeast cells, which may affect PPIs due to inappropriate modifications or incorrect localizations of proteins. Additionally, Y2H screens cannot determine PPIs in a specific cellular state. Instead, AP-MS can obtain protein interactions under specific conditions and map multiple protein interactions in parallel. However, AP-MS often requires specific antibodies or genetic engineering, which may alter protein structure and interaction sites. Furthermore, AP-MS can only detect stable interactions and will miss unstable or transient interactions.
CF-MS offers a powerful, high-throughput strategy for mapping PPIs at scale. Unlike genetic or affinity-based methods, CF-MS operates without exogenous perturbations, enabling the interrogation of protein associations under near-physiological conditions. The core principle of CF-MS involves measuring coelution profiles across chromatographic fractions and inferring PPIs through correlation-based scoring and downstream graph partitioning algorithms for de novo protein complex detection [12, 15]. However, the sensitivity and specificity of these inferences are often constrained by the reliance on hand-crafted features [17–21], which compress the data and introduce biases. Moreover, CF-MS data are inherently noisy, further complicating accurate interaction assignment. To address these limitations, our previous work introduced SPIFFED, a convolutional neural network (CNN)–based end-to-end model that bypasses handcrafted features by directly learning informative features from CF-MS data [22]. SPIFFED outperformed PrInCE [18, 20] and EPIC [19], 2 widely adopted and representative CF-MS analysis tools that rely on predefined similarity metrics and manually selected features, validating the advantage of deep learning in CF-MS analysis. Despite this progress, SPIFFED retains architectural limitations: it models interactions strictly at the protein-pair level, which hinders the learning of transferable protein-level features and limits the capture of higher-order interaction patterns. Additionally, its reliance solely on CF-MS coelution data limits sensitivity when signals are weak or ambiguous.
Here, we introduce FREEPII (Feature Representation Enhancement End-to-End Protein Interaction Inference), a unified deep learning framework that integrates CF-MS data with sequence-derived features and learns discriminative protein-level representations for accurate and scalable protein interaction inference. FREEPII also adopts a CNN-based architecture that learns directly from raw elution profiles while constructing protein-level representations shared across interaction pairs. This design reduces computational complexity from 2N(N − 1)M to N**M (where N is the number of proteins and M is the number of CF-MS fractions), enabling more consistent and transferable feature learning. To enhance sensitivity, particularly when CF-MS signals are weak or ambiguous, FREEPII incorporates protein sequence features as auxiliary inputs. In addition, the supervised protein embeddings capture higher-order interaction contexts from complex-level annotations, enabling the model to internalize network-level dependencies beyond pairwise interactions and enhance representation learning in a fully end-to-end manner [23, 24]. Through comprehensive evaluation, FREEPII consistently outperforms established CF-MS analysis tools, including the widely recognized handcrafted feature-based tools EPIC and our previous deep learning–based baseline SPIFFED, demonstrating superior performance in both interaction prediction and complex-level organization. It highlights the effectiveness of FREEPII’s design in supporting accurate inference across diverse biological contexts. The examination of the learned representations illustrates that FREEPII’s architectural design facilitates the learning of more informative and discriminative protein representations. Cross-dataset experiments further demonstrate that integrating multimodal data across diverse experimental settings substantially improves generalization and sensitivity, validating the approach’s robustness for scalable, cross-species protein interaction inference. The source code of this study is freely available at https://github.com/qqpigass/FREEPII.
Results
Design and workflow of FREEPII for protein-level representation learning
Figure 1 outlines the analysis workflow of FREEPII, a deep learning framework that integrates CF-MS data and sequence features to construct informative protein representations for downstream protein interaction inference. Each protein sequence is first encoded into a 1-dimensional flattened frequency matrix chaos game representation (FCGR) vector, which is stacked across proteins to form the FCGR matrix. In parallel, a trainable protein embedding matrix is initialized, and elution profiles from all proteins are assembled into a CF-MS matrix. To address potential scale mismatches, both CF-MS and FCGR inputs are individually normalized to the [0, 1] range. The FCGR and embedding matrices are then combined via element-wise addition and subsequently concatenated with the CF-MS matrix to form a unified multimodal input for representation learning.
Analysis pipeline of FREEPII. The input of FREEPII consists of 3 parts, including CF-MS data containing information about protein interactions in experiments (matrix A), protein sequence data (in FCGR form; see Methods) with conserved information between proteins (matrix B), and protein embeddings (matrix C) that encapsulate protein interaction information in protein complexes after training. To form the final input, matrix B is added to matrix C and then is concatenated with matrix A. FREEPII completes feature extraction and map creation through a convolutional layer and generates protein feature representations by flattening the feature map along the filter dimension. The protein-pair index is used as an additional input to extract and construct PPI feature representations. The residual connection strategy is used when constructing feature representations of proteins and PPIs; the latter are then fed into a fully connected layer. The output, comprising predicted PPI scores together with the protein-pairing index, is subsequently used for clustering analysis to predict protein complexes. Notably, N represents the number of proteins, M denotes the number of fractionations, M′ is the length of the flattened FCGR of the protein sequence and protein embedding, and F indicates the length of a protein’s feature representation.
This multimodal input is processed through a CNN, which automatically extracts features and generates protein-level representations. Protein-pair representations are then constructed by retrieving and concatenating individual representations based on predefined pairing indices, eliminating the need for additional pairwise encoders. These features are subsequently used for downstream tasks such as interaction prediction and clustering. To facilitate training, residual connections [25, 26] are incorporated at both the protein and protein-pair levels to improve convergence and model stability. Furthermore, during model optimization, supervised protein embeddings transfer network-level information from complex labels into the feature space, guiding the feature extraction process toward learning interprotein dependencies.
FREEPII delivers robust and accurate protein interaction prediction
To evaluate FREEPII’s predictive power, we benchmarked its PPI classification performance against 2 representative CF-MS analysis tools: EPIC and SPIFFED. EPIC relies on manually engineered similarity metrics, while SPIFFED employs a deep learning framework but focuses solely on pairwise signal variation. FREEPII consistently outperforms both methods across multiple metrics, including sensitivity, specificity, Matthews correlation coefficient (MCC), and receiver operating characteristic (ROC) area under the curve (AUC), on human and yeast datasets (Fig. 2A). Specifically, FREEPII achieved average improvements of 0.150/0.093 in sensitivity, 0.103/0.039 in specificity, 0.250/0.135 in MCC, and 0.108/0.063 in AUC over SPIFFED for human/yeast datasets, respectively.
PPI classification performance. Four scoring metrics, including sensitivity, specificity, MCC, and AUC of ROC, are used to evaluate the performance of (A) existing CF-MS analysis tools and (B) models with various structures and inputs in the ablation study on the PPI classification task. All results are based on predictions from the testing set. Specifically, RF-C represents the random forest model using only CF-MS features as input, CNN-C denotes CNN using solely CF-MS data as input without incorporating protein sequences, and FREEPII(-) indicates FREEPII without incorporating protein embeddings.
This performance advantage stems from FREEPII’s 3 architectural innovations: (i) learning task-relevant features directly from data, (ii) incorporating orthogonal biological information to enrich feature space, and (iii) refining protein representations by incorporating network-level context through protein embeddings. Ablation experiments (Fig. 2B, Supplementary Fig. S3) confirmed the additive value of each component. The CNN trained solely on CF-MS already surpassed the traditional random forest (RF)–based model. Adding sequence input further improved sensitivity by reducing false negatives arising from ambiguous or weak CF-MS signals, while protein embeddings enhanced both sensitivity and specificity, highlighting their role in strengthening representation quality. These results illustrate the effectiveness of FREEPII’s modular and biologically informed architecture in resolving complex interaction patterns that remain inaccessible to existing CF-MS tools.
The discriminative power of FREEPII from both CF-MS data and protein sequences
To better understand how FREEPII combines CF-MS data and protein sequence information when predicting PPIs, we used saliency maps to examine the contribution of each input to individual predictions (see Methods). As shown in Fig. 3A, FREEPII draws on both CF-MS data and sequence-based features with comparable frequency across most PPIs, suggesting that both inputs play essential and often complementary roles in the classification process. For a subset of interactions, however, FREEPII places greater emphasis on CF-MS data, while relatively few PPIs are predicted primarily using sequence information. Notably, when CF-MS input dominates, the model tends to focus on earlier elution fractions (Fig. 3B), a region that often reflects interaction states, as interacting proteins tend to form larger assemblies that elute earlier. In contrast, when predictions are sequence-driven, CF-MS input contributes minimally, suggesting that interaction-relevant cues are directly embedded in the primary sequence in some cases.
*FREEPII exploits 2 inputs for classification and learns discriminative feature representations. (A) Average difference in intensity between CF-MS and protein sequence input regions using the saliency map. This is generated by applying FREEPII to the yeast dataset Y2. It visualizes the intensity contrast between CF-MS and protein sequence input regions, derived from 10,000 PPIs. (B) The saliency maps depicted 3 categories of PPIs, CF-MS∼SEQ, CF-MS<SEQ, and CF-MS>SEQ, based on the degree to which FREEPII considers coelution profiles or protein sequences during prediction. The prediction results are derived from FREEPII’s application to yeast dataset Y2. For clarity, the padding part of CF-MS data has been removed, and only the first 100 units of the protein sequence region are displayed to avoid compressing the CF-MS data. A gray line is manually inserted to separate the 2 inputs visually. (C) Feature representations of proteins learned by models via t-SNE (using yeast dataset Y2). The t-SNE visualization for all human and yeast datasets is shown in Supplementary Figs. S4 and S5, respectively. Different colors represent labels for different protein complexes. For clarity, only 11 yeast protein complexes are labeled. Cosine distances between pairs of protein feature representations within and between protein complexes are calculated, and the Kruskal–Wallis test is used to assess whether the difference in distance distribution between the 2 groups is significant. ns: P > 0.05, *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, ***P ≤ 0.0001.
This flexible use of multimodal information suggests that FREEPII adjusts its prediction strategy based on the nature of the input, rather than applying a uniform rule. While CF-MS data serve as the primary source of predictive signal, sequence-derived features provide complementary cues that help refine the model’s interpretation when coelution patterns alone are ambiguous. The ability to adaptively incorporate auxiliary sequence information helps FREEPII align its input focus with the available biological context.
FREEPII captures higher-order interaction patterns via multimodal and embedding-guided learning
To assess whether the learned protein representations capture a higher-order interaction structure, we conducted both qualitative and quantitative analyses across different model configurations. Specifically, we applied t-distributed stochastic neighbor embedding (t-SNE) to project representations into 2 dimensions for visual comparison and computed pairwise cosine distances to assess structural consistency in the representation space (Fig. 3C, Supplementary Figs. S4, S5). Protein complex annotations were used as labels in both analyses. For visual clarity, a representative subset of 9 human and 11 yeast protein complexes was selected for display.
As shown in Fig. 3C, the sequence-only model (CNN-S) yielded representations that lacked alignment with complex annotations, indicating limited ability to capture complex-level organization. Adding protein embeddings (CNN-SE) markedly improved this organization, resulting in tighter intracomplex groupings and better concordance with complex labels. FREEPII further refined the feature space by integrating CF-MS data alongside sequence and embedding inputs. As shown in Supplementary Figs. S4 and S5, its representations showed enhanced compactness and clearer separation between complexes compared to CNN-SE. For example, in human datasets H3 and H4, FREEPII distinctly separated the 28S and 39S mitochondrial ribosomal subunits, which remained partially overlapping in CNN-SE. Similar improvements were seen in yeast, particularly for the mitochondrial large ribosomal subunit. These observations were supported by cosine distance analysis, with CNN-S showing little difference between intra- and intercomplex distances, while both CNN-SE and FREEPII exhibited significantly lower intracomplex distances. These results demonstrate that protein embeddings effectively guide the learning of sequence-based representations toward biologically meaningful groupings and that CF-MS data further enrich these representations by capturing dynamic interaction patterns.
FREEPII clusters proteins with superior structural, functional, and spatial coherence
Building on its strong performance in PPI prediction and its ability to learn structured representations, we next evaluated FREEPII’s effectiveness in identifying biologically meaningful protein clusters, in comparison with EPIC and SPIFFED. We first applied the composite score to assess the structural consistency between predicted clusters and reference protein complex annotations. FREEPII achieved the highest composite scores across both human and yeast datasets (Fig. 4A), indicating superior structural resolution. To move beyond curated references, we next assessed the functional coherence of predicted clusters using the GOGO score [27], which quantifies semantic similarity in gene ontology without relying on predefined complex boundaries. FREEPII again outperformed competing models across nearly all human datasets and showed the strongest results in the yeast biological process (BP) ontology (Fig. 4B, Supplementary Table S1), demonstrating its ability to group proteins based on shared biological roles. We further evaluated spatial coherence by computing subcellular colocalization scores for each predicted cluster. FREEPII consistently achieved the highest scores in both species (Fig. 4C, Supplementary Table S2), suggesting that its clustering aligns with known subcellular localization patterns. Notably, even when the protein embedding component was removed, the multimodal version of FREEPII (FREEPII(-)) still outperformed all single-modality models in clustering accuracy (Supplementary Fig. S6, Supplementary Tables S3 and S4), highlighting the benefit of integrating heterogeneous biological information. Collectively, these findings underscore FREEPII’s capacity to unify fine-grained interaction predictions with biologically coherent clustering, outperforming current CF-MS tools across multiple organizational levels.
*Cluster quality assessment. To assess the quality of clusters produced by each model’s predicted PPIs, 3 scoring metrics, including composite, GOGO, and colocalization scores, are employed. (A) The evaluation through the composite score, which is the sum of overlap score (red), accuracy (blue), and MMR (green). (B, C) GOGO (BP ontology; see Methods) and colocalization scores, respectively. The Wilcoxon rank-sum test was used to assess whether the distribution of GOGO scores or colocalization scores between the 2 groups was significantly different. *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, ***P ≤ 0.0001.
FREEPII successfully identifies known and novel biologically relevant protein complexes
To further illustrate the practical utility of FREEPII, we present 2 types of case studies that demonstrate its ability to identify both well-characterized and novel protein complexes with strong biological relevance.
In the first case, FREEPII successfully reconstructed canonical complexes documented in the curated gold standard, such as the human Mediator complex and the yeast cytoplasmic ribosomal large subunit (see Table 1). These predicted clusters consistently achieved high Jaccard index scores across biological replicates, markedly surpassing those generated by SPIFFED and EPIC, which often exhibited lower overlap and greater variability. Importantly, the superior overlap observed in FREEPII’s predictions is not an artifact of cluster size but a result of its enhanced capacity to resolve biologically meaningful organization within complex CF-MS datasets. In the second case, we investigated novel clusters predicted by FREEPII that are absent from the gold standard but supported by independent experimental evidence. For example, in the Y1 dataset, FREEPII identified a cluster comprising RNQ1, PBP1, PBP4, LSM12, MAK11, SLK19, and GDE1. Although this cluster has no counterpart in the reference annotations, interactions among PBP1, PBP4, and LSM12 have been previously reported [28], supporting the biological plausibility of this prediction. Similarly, in the Y4 dataset, FREEPII predicted a cluster including FAF1, IBD2, LOC1, MRM1, and RCM1, among which the interaction between LOC1 and RCM1 was recently validated by an independent study [29]. These case studies highlight FREEPII’s effectiveness in recovering both established and previously unannotated protein complexes, reinforcing its value for real-world bioanalysis.
Experimental and biological heterogeneity contributes to improved generalization performance
Given the varied resolutions of currently available CF-MS data across experiments and species, it is of interest to examine whether incorporating datasets with different numbers of fractionations and species origins can improve prediction performance. To address this, we evaluated the generalization ability of FREEPII through a series of cross-prediction experiments using combinations of human and yeast datasets. Since the supervised protein embedding layer in FREEPII is tailored to fixed label structures and input dimensions, we employed a variant, FREEPII(-), that omits the embedding component to enable flexible training and testing across datasets. Specifically, we trained models on 1 or multiple CF-MS datasets and assessed their performance on others with distinct species and resolution characteristics.
As shown in Fig. 5A, when models were trained on a single dataset (H1 or Y1), the RF-based model (RF-C) generally outperformed the CNN-based models in cross-species scenarios. This reflects a known limitation of data-driven models trained on narrow, domain-specific inputs and underscores the relative robustness of predefined, formula-based features, which are less affected by resolution discrepancies. However, when additional datasets were incorporated during training (e.g., H1 + Y1 or H1 + H3 + Y1), the performance of CNN-based models improved substantially. Notably, FREEPII(-) achieved the highest MCC across all test sets when trained on the most diverse dataset combination, clearly surpassing CNN-C and significantly outperforming RF-C (Fig. 5A). This trend is further supported by Fig. 5B, where the MCC of RF-C increased marginally by 0.05, while CNN-C and FREEPII(-) showed greater improvements of 0.15 and 0.17, respectively. These findings highlight the advantages of FREEPII(-), which combines the flexibility of data-driven feature learning with the added value of orthogonal biological information. While RF models are inherently limited by static, handcrafted features that restrict adaptability, FREEPII(-) effectively learns transferable and biological meaningful representations. This makes it particularly well suited for scalable, cross-species protein interaction inference when trained on datasets with diverse biological origins and experimental resolutions.
*Cross-prediction and co-training. (A) Models were trained on either a single coelution dataset (H1 or Y1) or co-trained using multiple training sets (H1 + Y1 and H1 + H3 + Y1). The prediction performance is evaluated on their own testing set and on the testing sets of other coelution datasets. The x-axis indicates the training datasets, while the panels represent the species names for the testing set. Each boxplot is the distribution of performance scores of the trained model on the testing sets across all experiments for that species. The Wilcoxon rank-sum test was used to assess whether the MCC scores between the 2 groups was significantly different. *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, ***P ≤ 0.0001. (B) The growth curve displays the average performance score of each model across all testing sets, comparing the scheme of single-training (H1/Y1) to co-training (H1 + Y1, H1 + H3 + Y1). To ensure balanced contributions of different training sets to model learning, we down-sampled the training pairing indices from various training sets to match the smallest training set’s index.
Discussion
In this study, we present FREEPII, a deep learning framework for protein–protein interaction and protein complex inference that integrates CF-MS data, protein sequences, and supervised protein embeddings to learn informative protein-level representations through an end-to-end CNN-based architecture. Across all classification metrics on both human and yeast datasets, FREEPII consistently outperformed EPIC and SPIFFED—2 state-of-the-art CF-MS analysis tools—demonstrating superior sensitivity, specificity, and overall reliability in PPI prediction (Fig. 2A). We also evaluated FREEPII against Tapioca [21], a recently developed tool designed for CF-MS–based PPI prediction. On our datasets, Tapioca showed limited predictive performance, broadly consistent with other handcrafted feature-based baselines such as EPIC. Given its lack of retraining capability, we report these results in the Supplementary Materials (Supplementary Table S5) for completeness. Our results reveal that each input modality contributes distinct and complementary strengths to the model. CF-MS data provide dynamic signals that form the backbone of interaction inference. Sequence-based features offer orthogonal biological context, helping resolve ambiguous or weak coelution signals. Supervised protein embeddings encode higher-order structural information from protein complex labels, guiding the feature extraction process toward capturing interprotein dependencies (Fig. 3). This multimodal strategy enables the learning of structured, discriminative protein representations that not only support superior pairwise interaction inference but also lead to biologically coherent clustering with top performance across structural, functional, and spatial benchmarks, showing its comprehensive performance (Fig. 4). Overall, we demonstrate that FREEPII is a conceptually innovative and empirically supported method that significantly outperforms existing CF-MS analytical methods.
Although both FREEPII and SPIFFED adopt CNN-based architectures, FREEPII incorporates several architectural enhancements to improve computational efficiency, training stability, and representational capacity. First, FREEPII encodes features at the individual protein level rather than at the protein-pair level, allowing feature sharing across protein interactions, reducing the number of convolutional parameters by nearly half compared to SPIFFED. Second, FREEPII employs residual connections, inspired by ResNet [25] and Transformer [26], to improve gradient flow, stabilize training, and accelerate convergence. Third, FREEPII incorporates a supervised embedding layer to guide the modeling of interprotein dependencies. Given that FREEPII’s architectural design yields clear performance improvements, we further examined whether it also confers concrete benefits in computational efficiency relative to SPIFFED. To this end, we constructed 2 matched CNN variants, referred to as the FREEPII-like and SPIFFED-like models, and benchmarked them under controlled conditions using CF-MS data as the sole input, consistent with the single-modality framework employed by SPIFFED (Supplementary Table S6). The primary differences between these variants lie in whether convolution operates on individual proteins or protein pairs, as well as whether residual connections are included during training. Both models were trained under identical hardware settings (1.5 TB RAM CPU environment), and memory usage was recorded at peak training load. For time complexity, we compared the time required to reach a predefined classification accuracy threshold (0.8 for the H1 dataset), marking the transition from early training to convergence. As shown in Supplementary Table S7, Supplementary Table S8, and Supplementary Fig. S7, the FREEPII-like model consistently exhibited lower memory consumption and faster convergence across all experimental configurations. Notably, these advantages became more pronounced with increasing model complexity (e.g., 128 and 256 filters). Taken together, these results underscore the practical efficiency of FREEPII’s design, showing that its streamlined architecture not only improves learning dynamics but also reduces computational cost.
In Fig. 2B, we observed that CNN models outperform RF classifiers more substantially on human datasets than on yeast. This likely stems from differences in data resolution, as human CF-MS datasets contain more fractionation channels and thus yield more complex and information-rich elution profiles. Such higher-resolution data often challenge handcrafted feature extraction used in RF classifiers, which compresses signals and can result in the loss of critical information. In contrast, CNNs learn directly from the raw input space, enabling them to retain informative patterns and maintain robust performance as data complexity increases. We also found that incorporating sequence-derived features as an additional input further improves model sensitivity, particularly by reducing false negatives when coelution signals are weak or ambiguous. This observation aligns with previous work showing that multimodal integration enhances model sensitivity [30, 31]. FREEPII readily accommodates diverse sequence-based modalities, including gene expression profiles and structural embeddings derived from pretrained models [32, 33], underscoring its flexibility and extensibility. When combined with supervised protein embeddings, FREEPII achieves marked improvements in both sensitivity and specificity, demonstrating the value of incorporating higher-order interaction information into the representation learning process. Prior work by Singh et al. [34] also emphasized the importance of network-level information in improving PPI prediction, but their method relied on explicit manipulation of the adjacency matrix, limiting training coherence. In contrast, FREEPII integrates this information in a seamless, end-to-end manner. Beyond enhancing biological relevance, the protein embeddings also function as an internal correction mechanism: when sequence-derived features are missing, it can help impute the input [35], thereby improving model robustness and data integrity.
In FREEPII, protein embeddings are integrated with sequence-derived features through element-wise addition. This strategy enables the embeddings to reinforce biologically meaningful sequence patterns without increasing input dimensionality, allowing a more expressive feature space at no additional computational cost. While protein embeddings can improve representation learning, they may also introduce challenges. We observed that a CNN model trained solely on protein sequence inputs with protein embeddings tends to overpredict positive PPIs when evaluated on the full interaction set (including training, testing, and experimental PPIs) (Supplementary Fig. S8). This occurs because embedding vectors are optimized only for proteins included in the training set, resulting in uninitialized or poorly trained embeddings for unseen proteins. When inputs such as FCGR exhibit values numerically close to untrained embeddings, their additive combination during integration may introduce spurious signals, thereby compromising classification accuracy. FREEPII addresses this issue through a careful architectural choice: instead of summing CF-MS and sequence-derived features, it concatenates them to preserve the integrity and independent contribution of each modality. This design ensures that the more dynamically informative and numerically dominant CF-MS input remains the primary driver of the model’s predictions, while enhanced-sequence features contribute complementary signals. As a result, the influence of untrained embeddings is substantially diminished. As shown in Supplementary Fig. S8, this approach produces a prediction distribution that closely aligns with the CF-MS–only baseline, supporting the robustness and reliability of FREEPII’s multi-input integration strategy.
To manage the increased complexity introduced by multimodal inputs and the added parameter burden from protein embeddings (Supplementary Tables S9, S10), especially under limited training data, FREEPII incorporates several regularization strategies to improve model stability and generalizability. These include dropout, weight decay, and residual connections, which collectively help constrain the solution space and mitigate overfitting. As illustrated by the learning curves of FREEPII (Supplementary Fig. S9), the testing loss consistently decreases and stabilizes throughout training, with no noticeable divergence from the training loss—indicating effective regularization and minimal overfitting. Furthermore, FREEPII attains a classification accuracy of approximately 0.9 across nearly all test datasets (Fig. 2), reflecting its strong generalization performance. While a slightly larger gap between training and testing loss is observed for the yeast dataset relative to the human data, this is likely attributable to the yeast dataset’s smaller size and simpler interaction space. Nonetheless, the results consistently indicate that FREEPII maintains robust generalization. The effectiveness of the regularization strategies also supports the potential for extending FREEPII to a deeper architecture in future work without sacrificing generalization.
FREEPII’s top performance in multilevel clustering benchmarks suggests its ability to recover protein groupings that align with known biological organization (Fig. 4A). Given that protein sequences are the key input of FREEPII, we further evaluate the structural confidence of the predicted clusters using AlphaFold 3 [36, 37]. We use the interface predicted template modeling (ipTM) score to assess the prediction confidence, as it reflects the accuracy of the predicted relative positions of subunits within a complex. To benchmark performance, we compared the average ipTM scores of FREEPII-predicted clusters against those of randomly generated clusters. As shown in Supplementary Fig. S10, the average ipTM score of the FREEPII-predicted clusters was significantly higher than the average of the random distribution. These results suggest that, even in the absence of other structural information such as bond angles, modifications, and sequence variants, FREEPII can group sequence-related proteins into clusters that are more likely to form energetically stable structures rather than randomly composed interactions. In future work, we plan to incorporate attention mechanisms (which have been widely adopted in models such as Transformer [26] and AlphaFold [32, 36, 37]) to further enhance the learning of protein representations by considering their own context and differences from all other proteins. This approach may improve the model’s ability to capture complex interaction patterns beyond the capabilities of CNNs.
Finally, our results demonstrate that combining CF-MS datasets with varying fractionation resolutions and species origins improves the generalization and sensitivity of CNN-based models, as exposure to biologically and experimentally diverse inputs promotes the learning of transferable representations and enhances predictive robustness across domains—analogous to how data augmentation improves model adaptability in vision tasks [38]. Incorporating complementary sequence-derived features further strengthens the model’s ability to recover positive PPIs, which is particularly valuable for expanding experimentally testable interaction space in downstream applications. These findings demonstrate the effectiveness of FREEPII’s end-to-end framework in integrating multimodal biological data to support generalizable protein interaction discovery across diverse experimental contexts.
Conclusions
In summary, FREEPII establishes a unified computational framework that jointly leverages CF-MS data and sequence-derived features to learn discriminative protein-level representations within a coherent architectural design, enabling accurate PPI classification and a biologically consistent protein complex inference. Its architecture learns from heterogeneous biological modalities while preserving the distinct contributions of each input source. The supervised protein embeddings are used to incorporate higher-order interaction context derived from complex-level annotations, extending the model’s representational capacity. Architectural design choices, including residual connections and modality-aware input handling, contribute to training stability and reliable predictions for novel interactions, respectively. The strong generalization and sensitivity observed across diverse experimental settings highlight the effectiveness of FREEPII’s integrated multimodal design and data-driven feature learning for scalable and cross-species protein interaction inference. Moreover, the flexibility to incorporate diverse biological inputs positions FREEPII as a robust and extensible framework for large-scale protein interaction and complex discovery.
Methods
CF-MS dataset curation and data pre-processing
The human CF-MS datasets (PXD002892, PXD014820, and PXD015406) were downloaded from Zenodo (doi: 10.5281/zenodo.4106578), where all uploaded data were reanalyzed by Skinnider and Foster using MaxQuant [17]. Among the various files corresponding to different protein quantification strategies provided by the authors, we selected those containing iBAQ intensity of chromatograms for further analysis. The yeast (Saccharomyces cerevisiae) CF-MS dataset (PXD031967) was curated by ourselves, with the detailed experimental processes described in our previous publication [39]. The number of proteins, the number of fractions, and the fractionation methods for all datasets are listed in Table 2. Protein overlap between experiments on the same species is shown in Supplementary Fig. S1. For handling missing values, we replaced them with zeros and then removed rows containing only zero values. Subsequently, we conducted normalization to ensure the sum of each CF-MS profile equaled 1. The number of genes in each CF-MS dataset is also listed in Table 2. To standardize input dimensions across datasets with varying fraction numbers, all elution profiles were adjusted to a fixed length of 200 fractions. Zero-padding was applied only to the end of each profile, thereby preserving the original coelution signal. This procedure enabled cross-dataset training and prediction while maintaining the biological interpretability of elution patterns.
Protein complex collection and protein-pair labeling
The human protein complex dataset was downloaded from the CORUM database [40], while the yeast protein complex dataset was downloaded from Costanzo et al. [41], which was manually inspected for physical protein–protein interactions and modified to remove genetic interactions and redundant protein complexes. In total, 3,614 and 575 protein complexes were documented in human and yeast, respectively. We filtered out protein complexes consisting of fewer than 3 genes, resulting in 2,277 human and 317 yeast protein complexes. The number of proteins present in the known protein complexes for each CF-MS dataset is shown in Table 2.
Protein pairs within the same protein complex are labeled as “positive PPIs,” and protein pairs that exist between different protein complexes are labeled as “negative PPIs.” However, positive PPIs are reclassified as negative if they lack any coeluting characteristic (where signal multiplication for the same fraction is greater than 0.01). Protein pairs in the CF-MS data are categorized as “experimental PPIs” if they do not fall into the positive or negative PPI categories. Only positive PPIs and negative PPIs are used for model training and evaluation.
Protein sequence collection and numerical representation
Human and yeast protein sequences were retrieved from the UniProt database [42] and subsequently converted into FCGR [43, 44] using R package “kaos.” We set the resolution to 16 and scaling factor to 0.863271 to prevent the overlap of attractors [45, 46]. Each FCGR frequency matrix was normalized so that all values ranged between 0 and 1. The frequency matrix of each of N proteins was then reshaped into dimension 12561 and concatenated in the first dimension to form a matrix with dimension N**2561 as the input of the model.
Model architecture
FREEPII consists of an embedding layer, a CNN layer, and 3 fully connected layers, as shown in Supplementary Fig. S2. The embedding layer generates protein embeddings, which are directly added to the FCGR matrix. This combined matrix is then concatenated with the CF-MS matrix, forming the input for FREEPII. The input is subsequently transformed into a feature map via the CNN layer, with the original input being reintroduced into each filter channel of the feature map. Features corresponding to paired proteins in the feature map are extracted using the provided pairing indexes, followed by the subtraction of paired features. A similar process is applied to the input to get another matrix of feature differences. These 2 difference matrices, derived from the input and feature map, are concatenated along the second dimension, and the final dimension is flattened to form a 2-dimensional matrix. This matrix is then passed through 3 linear layers to generate the final prediction scores for the protein pairs. Detailed parameter settings are available in the following code: https://github.com/qqpigass/FREEPII.
Model training and evaluation on PPIs
During model training, subsets of positive and negative PPIs were used, maintaining a ratio of 1:1, and 5-fold cross-validation was applied to obtain the average performance of the model. To classify predicted interactions as positive or negative, a hard threshold of 0.5 was set. Interactions with predicted scores less than or equal to 0.5 were defined as negative PPIs, while those above 0.5 were considered as positive PPIs. Four classification evaluation metrics were used, including sensitivity, specificity, MCC, and AUC of ROC, defined as follows:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \textit{Sensitivity} = \frac{{\ TP}}{{TP + FN}} \end{eqnarray*}\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \textit{Specificity} = \frac{{\ TN}}{{TN + FP}} \end{eqnarray*}\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} MCC = \frac{{\ TP \times TN - FP \times FN}}{{\sqrt {\left( {TP + FP} \right)\left( {TP + FN} \right)\left( {TN + FP} \right)\left( {TN + FN} \right)} }} \end{eqnarray*}\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} AUC\ of\ ROC = 1 - \textit{Specificity}\ at\ \textit{various}\ \textit{threshold}\ \textit{values} \end{eqnarray*}\end{document}Comparison with other CF-MS analysis tools
We compared the performances of FREEPII in both PPI classification and clustering evaluation with 2 existing CF-MS analysis tools, EPIC and SPIFFED. For EPIC, we used its default feature extraction metrics—namely, mutual information, Bayes correlation, Euclidean distance, Jaccard score, and Apex score—to generate features of PPIs for model training. We performed a 5-fold cross-validation under the conditions of data balance and a training/test ratio of 70:30 [22]. For a fair comparison, our clustering algorithm is used to generate clusters from the outputs of EPIC and SPIFFED.
Ablation study on model architecture
We conducted an ablation study to gain deeper insights into the impact of design components, including feature extraction, consideration of protein sequences, and network-level information. In this study, we employed a RF model using CF-MS data as input (denoted as RF-C in this study) as the baseline model for PPI classification tasks. The depth of RF was set to 1,000, and 7 features—including distance correlation, weighted cross-correlation (WCC), mutual information (MI), cosine similarity, Pearson and Spearman correlation, and Kendall rank correlation—were computed as the features of paired CF-MS profiles. These feature combinations enabled the RF model to achieve the best classification performance among those studied by Skinnider and Foster [17]. The structures of CNN-based models for ablation study are shown in Supplementary Fig. S2.
Visualization of feature representations of proteins labeled by protein complexes
The feature map of CF-MS data (NMF, where N represents the number of proteins, M denotes the dimension along fractions, and F indicates the number of filters) was extracted, and the dimension of filters was flattened to form a new matrix with dimension N**M′* (where M′ represents the product of the dimension along fractions and the number of filters). Each row in this matrix represents a feature representation of a protein. Subsequently, we merged the protein feature representations with the names of the protein complexes to which they belong. To ensure visualization clarity, we selected 9 human protein complexes (28S ribosomal subunit, mitochondrial; 39S ribosomal subunit, mitochondrial; 40S ribosomal subunit, cytoplasmic; 60S ribosomal subunit, cytoplasmic; Nop56p-associated pre-rRNA complex; nuclear pore complex; PA700 complex; spliceosome, A complex; TRBP containing complex) and 11 yeast protein complexes (19S proteasome regulatory particle; 90S preribosome; cytoplasmic ribosomal large subunit; cytoplasmic ribosomal small subunit; F0/F1 ATP synthase [complex V]; Kornberg’s mediator [SRB] complex; mitochondrial small ribosomal subunit; mitochondrial large ribosomal subunit; nuclear pore; preribosome, large subunit precursor; RSC complex) for labeling. Duplicate proteins were filtered out before merging with the feature representation matrix. The labeled feature representation matrix was then dimensionally reduced to 2 dimensions via t-SNE for visualization.
Visualize feature hotspots for classifying each PPI by computing saliency maps
To assess the contributions of CF-MS data and protein sequences in FREEPII’s prediction of PPIs, we employed the saliency map [47] to visualize the feature hotspots for each PPI classification. Based on the average intensity difference of the CF-MS data and protein sequences inputs, we categorized the classified PPIs into 3 groups: if the average intensity of the CF-MS region exceeds that of the protein sequence region (SEQ) by more than 0.1, the classification is labeled as CF-MS > SEQ; if it is less than −0.1, it is labeled as CF-MS < SEQ; otherwise, it is classified as CF-MS ∼ SEQ. The final representation of the saliency map for each PPI category is formed by superimposing the values of the saliency map calculated for each PPI and normalizing these values to range between 0 and 1.
Generating clusters using predicted PPI scores
The pairing indexes (edges) and prediction scores (weights) are used to construct the adjacency matrix “A.” Proteins (nodes) without any neighbors are removed. The adjacency matrix “A” is then processed through the Markov cluster algorithm (MCL) [48, 49] with the expansion and inflation parameters set to 2, iterating 3 times to obtain the matrix “A_.” To consider the topological properties within the adjacency matrix, the topological overlap matrix (TOM) [50] “I_” is calculated according to the following formula:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} {{w}_{ij}} = \frac{{{{l}_{ij}} + {{a}_{ij}}}}{{min\left\{ {{{k}_i},{{k}_j}} \right\} + 1 - {{a}_{ij}}}}, \end{eqnarray*}\end{document}where w_ij_ is the new weight of edge between node i and node j, a_ij_ is the weight of edge between node i and node j on the 2.5th power of “A,” l_ij_ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} = \sum \nolimits_u {{a}{iu}}{{a}{uj}}\end{document} , and k_i_ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} = \sum \nolimits_u {{a}{iu}}\end{document} is the node connectivity. Then matrix “A” and matrix “I_” are then combined in proportions of weights 0.3 and 0.7, as well as weights 0.1 and 0.9, respectively, to form 2 importance matrices. Cosine distance matrices are calculated from these 2 matrices, followed by Ward hierarchical clustering. The dynamic cut-tree algorithm of the Python function “cutreHybrid” is used to obtain clusters, with “minClusterSize” set to 3 and “deepSplit” set to 3 (“deepSplit” should be set to 1 or 2 for relatively small amounts of data).
The clusters obtained from the 2 matrices are then combined to form a set of clusters with overlapping members. To prevent the generation of unreasonably large clusters, we limit the size of the clusters to fewer than 100. If any cluster exceeds this limit, the above splitting steps are repeated to ensure compliance. For highly overlapping clusters, an iterative merging step is performed. The merge threshold is set to 0.25, where the denominator is the product of the sizes of 2 clusters, and the numerator is the square of the amount of overlap between the 2 clusters.
Gene function annotations and semantic similarity measurement of Gene Ontology terms
We retrieved the semantics and relationships between Gene Ontology (GO) terms from the GO Consortium released on 4 November 2022. To measure the semantic similarity between GO terms of proteins within the same complex, the GOGO algorithm is used [27, 51]. For a GO term t, the semantic contribution weight is calculated according to the link type and the number of child nodes, considering the semantic contribution of ancestors in the GO directed acyclic graph (DAG) to t:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} {{W}_e} = \frac{1}{{\left( {c + nc\left( t \right)} \right)}} + d, \end{eqnarray*}\end{document}where nc(t) is the total number of child nodes for GO term t. The constant parameter c is set to 0.67 in GOGO to ensure 0 < W_e_≤ 1. The constant parameter d is assigned values of 0.4 and 0.3 for the “is-a” and “part-of” relationships, respectively. For each term in DAG_t_, it has a semantic contribution to the target term t, defined as the S-value:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \left\lbrace \begin{array}{c} {S}_{t}\left( {\textit{self}} \right) = 1\\ {S}_{t}\left( {\textit{other}} \right) = max\left\lbrace {W}_{e} \times {S}_{t}\left( {\textit{other}} \right)| {\textit{other}\in}\, \textit{children}\left( t \right)\right\rbrace\end{array} \right. \end{eqnarray*}\end{document}The semantic value of GO term t is the summation of S-values in DAG_t_:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} SV\left( t \right) = \sum \nolimits_{i\in t,\ \textit{ancestors}\left( t \right)} {{S}_t}\left( i \right) \end{eqnarray*}\end{document}Given another GO term k, the semantic similarity between 2 GO terms is defined as
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} {S}_{GO} (t,k) = \frac{ \sum \nolimits_{i\in ( {\textit{ancestors}}( t ) \cap\, \textit{ancestors}( k ) )} {S}_{t}(i) + {S}_{k}(i )} {SV( t ) + SV( k )}. \end{eqnarray*}\end{document}To calculate the semantic similarity between a gene G1 with m GO terms go11, go12, … go1m and a single GO term t, the equation is as follows:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} Sim( {t,{{G}_1}} ) = {}_{1 \le i \le m}^{max}( {{{S}_{GO}}( {t,\ g{{o}_{1i}}} )} ). \end{eqnarray*}\end{document}Given another gene G2 with n GO terms go21, go22, … go2n, the functional similarity between G1 and G2 is
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} Sim( {{{G}_1},\ {{G}_2}} ) = \frac{{\mathop \sum \nolimits_{1 \le i \le m} Sim( {g{{o}_{1i}},\ {{G}_2}} ) + \mathop \sum \nolimits_{1 \le j \le n} Sim( {g{{o}_{2j}},\ {{G}_1}} )}}{{m + n}}. \end{eqnarray*}\end{document}For a protein complex, the average of the pairwise functional similarities between all genes comprising the complex is calculated.
Colocalization within protein complexes
To assess the similarity of protein localizations within a complex, we use the colocalization score as defined in the study [52]. The subcellular locations of proteins are download from UniProt database [42]. The colocalization score of a protein complex is calculated as follows:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \textit{Scor}{{e}_{co - \textit{localization}}} = \frac{{ma{{x}_i}{{l}_i}}}{{\left| C \right|}}, \end{eqnarray*}\end{document}where l_i_ is the number of proteins of complex C assigned to the localization group i, and |C| is the number of proteins in the complex C with localization assignments. This score provides the maximum fraction of proteins within the complex that share the same localization, divided by the total number of proteins in that complex with known localizations. This metric ensures that the higher the score, the more colocalized the proteins within a complex are, indicating better functional coherence.
Structure similarity between predicted clusters and reference protein complex dataset
To evaluate the structural compositional similarity between predicted clusters and a reference protein complex dataset, we use the composite score, as described in the literature [19, 53]. The composite score is the sum of 3 components: overlap, accuracy, and maximum matching ratio (MMR) [53]. Overlap is defined as the percentage of predicted clusters that have an overlap score larger than 0.25 with any reference complex. The overlap score is calculated as the square of the number of overlapping proteins between the predicted cluster and the reference complex, divided by the product of their sizes. Accuracy is the geometric mean of sensitivity and positive predictive value (PPV), where sensitivity and PPV are calculated by the following formulas:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \textit{Sensitivty} = \frac{{\mathop \sum \nolimits_{i = 1}^n max_{j = 1}^m{{t}_{ij}}}}{{\mathop \sum \nolimits_{i = 1}^n \left| {{{b}_i}} \right|}}, \end{eqnarray*}\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} PPV = \frac{{\mathop \sum \nolimits_{j = 1}^m max_{i = 1}^n{{T}_{ij}}}}{{\mathop \sum \nolimits_{j = 1}^m \mathop \sum \nolimits_{i = 1}^n {{T}_{ij}}}},\ and \end{eqnarray*}\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \textit{Accuracy} = \sqrt {\textit{Sensitivty} \times PPV} , \end{eqnarray*}\end{document}where i is the index of the protein complex from 1 to n, and j is the index of the predicted cluster from 1 to m. t_ij_ and T_ij_ are the number of overlapping proteins between complex_i_ and cluster_j_, and |b_i_| is the size of complex_i_. The MMR builds on the maximal matching of the overlap score for each reference complex and all the predicted clusters, and it is calculated by dividing the sum of the matched overlap score by the number of reference complexes.
Evaluate the structure confidence of predicted clusters by AlphaFold-Multimer
To evaluate the feasibility of predicted clusters from the perspective of protein sequences, we used the AlphaFold Server (AlphaFold3, AF3) [37] to predict the structure of selected clusters. Due to the high computational cost, we limited the evaluation to protein clusters predicted by FREEPII on the Y1 dataset, specifically those with sizes ranging from 3 to 7 proteins. We use the ipTM score to assess the prediction confidence, as it reflects the accuracy of the predicted relative positions of the subunits with a complex. The idea of our comparison is to use the average scores of randomly generated clusters as a baseline to assess whether the average scores of clusters predicted by FREEPII are significantly larger than that of randomly generated clusters. Since AF3’s server limits outputs to 30 predictions per account per day, we precomputed a pool of random clusters to support statistical analysis. To account for ipTM score distribution differences by cluster sizes, we first generated 20 random clusters for each size (from 3 to 7) as background samples. We then created a synthetic set of random clusters with the same size distribution as the FREEPII-predicted clusters and calculated the average ipTM score for each set. This procedure was repeated 100,000 times via a bootstrap strategy to establish a background distribution. A 1-sample t test was used to assess whether the value is significantly different from the mean of the null distribution.
Availability of Source Code and Requirements
Project name: FREEPII
Project homepage: https://github.com/qqpigass/FREEPII
Operating system(s): Platform independent
Programming language: Python
License: MIT
biotoolsID: freepii
WorkflowHub DOI [54]: https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1844.1
Additional Files
Supplementary Table S1. Average GOGO scores across 3 ontologies (biological process [BP], molecular function [MF], and cellular component [CC]; see Methods for details) for clusters produced by EPIC, SPIFFED, and FREEPII in each experiment. The highest score is highlighted in black bold text.
Supplementary Table S2. Average colocalization score for clusters produced by EPIC, SPIFFED, and FREEPII in each experiment. The highest score is highlighted in black bold text.
Supplementary Table S3. Average GOGO scores across 3 ontologies (biological process [BP], molecular function [MF], and cellular component [CC]; see Methods for details) for clusters produced by each model in the ablation study in each experiment. The highest score is highlighted in black bold text.
Supplementary Table S4. Average colocalization score for clusters produced by each model in the ablation study in each experiment. The highest score is highlighted in black bold text.
Supplementary Table S5. Classification performance of Tapioca on datasets H1 and Y3. Tapioca was evaluated on our datasets H1 and Y3, which differs in resolution and protein coverage. Despite these differences, Tapioca showed similar performance across both datasets, consistent with the results reported in its original publication (Supplementary Table S1).
Supplementary Table S6. Parameter sizes of each layer in the SPIFFED-like and the FREEPII-like models, using 32 filters in the convolution layer as an example. Both models consist of 1 convolutional layer and 3 fully connected layers. The size of the convolutional layer’s weights in the FREEPII-like model is half that of the SPIFFED-like model.
Supplementary Table S7. Comparison of memory usage between SPIFFED-like and FREEPII-like models. The table reports the maximum memory usage of each model during training, measured in bytes.
Supplementary Table S8. Comparison of time complexity between SPIFFED-like and FREEPII-like models. The table reports the runtime (in seconds) required for each model to start the convergence phase during training.
Supplementary Table S9. Parameter sizes of each layer in FREEPII. The architecture includes 1 embedding layer, 1 convolutional layer, and 3 fully connected layers. N denotes the number of proteins.
Supplementary Table S10. Input dimensions of FREEPII for each dataset. The input profile has 2 dimensions: the number of proteins and the length of the input sequence (CF-MS combined with protein sequence information). The training PPIs input is represented by the number of protein pairs and the indices of the 2 proteins comprising each pair.
Supplementary Fig. S1. Number of intersected proteins in CF-MS data across various experiments conducted on the same species. Circles of different colors within the figure represent different experiments.
Supplementary Fig. S2. Architecture of models in the ablation study. This diagram illustrates the inputs, outputs, and architectures of all models used in the ablation experiment, taking 1-fold of training data from the human dataset H1 as an example.
Supplementary Fig. S3. PPI classification performance in the ablation study. This extended version of Fig. 2B encompasses PPI classification performances for additional models. In particular, it incorporates CNN using only protein sequence as input without protein embeddings (CNN-S) and CNN using both protein sequences and protein embeddings (CNN-SE), along with the previously compared models. All performance metrics are plotted using the models’ predictions on the testing set.
Supplementary Fig. S4. Visualizing feature representations of proteins learned by models via t-SNE (human). Different colors represent labels for 9 selected human protein complexes to maintain clarity in the visualization. Additionally, cosine distances between pairs of protein feature representations within and between protein complexes are calculated. The Kruskal–Wallis test is used to assess whether there is a significant difference in distance distributions between the 2 groups. The significance levels are denoted as follows: ns: P > 0.05, *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, ****P ≤ 0.0001.
Supplementary Fig. S5. Visualizing feature representations of proteins learned by models via t-SNE (yeast). Different colors correspond to labels for 11 selected yeast protein complexes, chosen to maintain clarity in the visualization. Additionally, cosine distances between pairs of protein feature representations within and between protein complexes are calculated. The Kruskal–Wallis test is used to assess whether the difference in distance distribution between the 2 groups is statistically significant. The significance levels are denoted as follows: ns: P > 0.05, *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, ****P ≤ 0.0001.
Supplementary Fig. S6. Composite score for clusters produced by each model in the ablation study across experiments. Each model in the ablation experiment is evaluated on the composite score, which is the sum of the overlap score (red), accuracy (blue), and MMR (green).
Supplementary Fig. S7. Training curves of the SPIFFED-like and FREEPII-like models with different numbers of filters in the convolution layer. The curves illustrate the different convergence speeds of the 2 models. The dashed line indicates an accuracy threshold of 0.8. The time required for each model to reach this threshold is recorded and used to compare their time complexity.
Supplementary Fig. S8. Proportion of positive PPIs among all predicted PPIs. The proportion of positive PPIs among all predicted PPIs is calculated, and only the classification results of CNN-SE had an unusually high proportion of positive PPIs.
Supplementary Fig. S9. Learning curves of FREEPII on human and yeast datasets. Training and testing loss curves are shown to assess whether the model exhibits signs of overfitting during training.
Supplementary Fig. S10. The average ipTM score of clusters predicted by FREEPII is significantly higher than the mean of the bootstrap-generated random distribution. The dashed line represents the mean ipTM score of the predicted clusters. The 1-sample t-test was used to assess whether this mean is significantly different from that of the bootstrap distribution.
giaf122_FREEPII_Supplementary_Material_0903
giaf122_Authors_Response_To_Reviewer_Comments_Original_Submission
giaf122_Authors_Response_To_Reviewer_Comments_Revision_1
giaf122_Authors_Response_To_Reviewer_Comments_Revision_2
giaf122_GIGA-D-25-00010_Original_Submission
giaf122_GIGA-D-25-00010_Revision_1
giaf122_GIGA-D-25-00010_Revision_2
giaf122_GIGA-D-25-00010_Revision_3
giaf122_Reviewer_1_Report_Original_SubmissionYuansheng Liu -- 2/9/2025
giaf122_Reviewer_1_Report_Revision_1Yuansheng Liu -- 5/28/2025
giaf122_Reviewer_1_Report_Revision_2Yuansheng Liu -- 7/21/2025
giaf122_Reviewer_2_Report_Original_SubmissionAndrew Emili -- 2/11/2025
giaf122_Reviewer_2_Report_Revision_1Andrew Emili -- 5/23/2025
Abbreviations
AP-MS: affinity purification/mass spectrometry; AUC: area under the curve; BP: biological process; CC: cellular component; CF-MS: co-fractionation/mass spectrometry; CNN: convolutional neural network; DAG: directed acyclic graph; FCGR: frequency matrix chaos game representation; FN: false negative; FP: false positive; FREEPII: Feature Representation Enhancement End-to-End Protein Interaction Inference; GO: Gene Ontology; MCC: Matthews correlation coefficient; MCL: Markov cluster algorithm; MF: molecular function; MI: mutual information; MMR: maximum matching ratio; PPIs: protein–protein interactions; PPV: positive predictive value; RF: random forest; ROC: receiver operating characteristic; TN: true negative; TOM: topological overlap matrix; TP: true positive; t-SNE: t-distributed stochastic neighbor embedding; WCC: weighted cross-correlation; Y2H: yeast 2-hybrid.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bludau I, Aebersold R. Proteomic and interactomic insights into the molecular basis of cell functional diversity. Nat Rev Mol Cell Biol. 2020;21(6):327–40.. 10.1038/s 41580-020-0231-2.32235894 · doi ↗ · pubmed ↗
- 2Larance M, Lamond AI. Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol. 2015;16(5):269–80.. 10.1038/nrm 3970.25857810 · doi ↗ · pubmed ↗
- 3Shi C, Liu F, Su X, et al. Comprehensive discovery and functional characterization of the noncanonical proteome. Cell Res. 2025;35(3):186–204.. 10.1038/s 41422-024-01059-3.39794466 PMC 11909191 · doi ↗ · pubmed ↗
- 4Cheng F, Zhao J, Wang Y, et al. Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nat Genet. 2021;53(3):342–53.. 10.1038/s 41588-020-00774-y.33558758 PMC 8237108 · doi ↗ · pubmed ↗
- 5Lu H, Zhou Q, He J, et al. Recent advances in the development of protein–protein interactions modulators: mechanisms and clinical trials. Signal Transduct Target Ther. 2020;5(1):213. 10.1038/s 41392-020-00315-3.32968059 PMC 7511340 · doi ↗ · pubmed ↗
- 6Huttlin EL, Bruckner RJ, Paulo JA, et al. Architecture of the human interactome defines protein communities and disease networks. Nature. 2017;545(7655):505–9.. 10.1038/nature 22366.28514442 PMC 5531611 · doi ↗ · pubmed ↗
- 7Paiano A, Margiotta A, De Luca M, et al. Yeast two-hybrid assay to identify interacting proteins. Curr Protoc Protein Sci. 2019;95(1):e 70. 10.1002/cpps.70.30133175 · doi ↗ · pubmed ↗
- 8Luck K, Kim DK, Lambourne L, et al. A reference map of the human binary protein interactome. Nature. 2020;580(7803):402–8.. 10.1038/s 41586-020-2188-x.32296183 PMC 7169983 · doi ↗ · pubmed ↗
