# Complete end-to-end learning from protein feature representation to protein interactome inference

**Authors:** Yu-Hsin Chen, Chien-Fu Liu, Jun-Yi Leu, Huai-Kuang Tsai

PMC · DOI: 10.1093/gigascience/giaf122 · GigaScience · 2025-11-06

## TL;DR

FREEPII is a deep learning framework that improves the accuracy and scalability of predicting protein interactions using mass spectrometry data and sequence features.

## Contribution

FREEPII introduces an end-to-end deep learning model that integrates CF-MS data and sequence features for robust and generalizable protein interaction inference.

## Key findings

- FREEPII outperforms existing tools in capturing biologically coherent and discriminative protein features.
- Multimodal data integration improves model generalization and sensitivity across datasets and species.
- Supervised embeddings capture higher-order interaction contexts for reliable discovery of novel interactions.

## Abstract

Co-fractionation coupled with mass spectrometry (CF-MS) is a powerful strategy for mapping protein–protein interactions (PPIs) under near-physiological conditions. Despite recent progress, existing analysis pipelines remain constrained by reliance on handcrafted features, sensitivity to experimental noise, and an inherent focus on pairwise interactions, which limit their scalability and generalizability. To address these difficulties, we introduce FREEPII (Feature Representation Enhancement End-to-End Protein Interaction Inference), a unified deep learning framework that integrates CF-MS data with sequence-derived features to learn biologically meaningful protein-level representations for accurate and efficient inference of PPIs and protein complexes.

FREEPII employs a convolutional neural network architecture to learn protein-level representations directly from raw data, enabling feature sharing across interaction pairs and reducing computational complexity. To enhance robustness against CF-MS noise, protein sequences are introduced as auxiliary input to enrich the feature space with complementary biological cues. The supervised protein embeddings further encode network-level context derived from complex annotations, allowing the model to capture higher-order interactions and enhance the expressive power of protein representations. Extensive benchmarking demonstrates that FREEPII consistently outperforms state-of-the-art CF-MS analysis tools, capturing more biologically coherent and discriminative protein features. Cross-dataset evaluations further reveal that integrating multimodal data from diverse experimental contexts substantially improves the generalization and sensitivity of data-driven models, offering a scalable, cross-species strategy for reliable protein interaction inference.

FREEPII provides a unified computational framework that integrates CF-MS data and sequence-derived features to learn discriminative and biologically consistent protein representations. By leveraging multimodal inputs through a coherent deep learning architecture, the model achieves accurate and scalable inference of PPIs and protein complexes across species. Its modality-aware design and supervised protein embeddings capture higher-order interaction contexts, ensuring robust generalization and reliable discovery of novel interactions. Overall, FREEPII offers a flexible and extensible foundation for data-driven exploration of protein interaction networks.

Graphical Abstract

## Full-text entities

- **Genes:** PAK1IP1 (PAK1 interacting protein 1) [NCBI Gene 55003] {aka MAK11, PIP1, WDR84, bA421M1.5, hPIP1}, TNNI3 (troponin I3, cardiac type) [NCBI Gene 7137] {aka CMD1FF, CMD2A, CMH7, RCM1, TNNC1, cTnI}, IBD2 (inflammatory bowel disease 2) [NCBI Gene 3378], FAF1 (Fas associated factor 1) [NCBI Gene 11124] {aka CGI-03, HFAF1s, UBXD12, UBXN3A, hFAF1}, CCT4 (chaperonin containing TCP1 subunit 4) [NCBI Gene 10575] {aka CCT-DELTA, Cctd, SRB}, GDE1 (glycerophosphodiester phosphodiesterase 1) [NCBI Gene 51573] {aka 363E6.2, MIR16}, MRM1 (mitochondrial rRNA methyltransferase 1) [NCBI Gene 79922], LSM12 (LSM12 homolog) [NCBI Gene 124801] {aka PNAS-135}
- **Diseases:** MS (MESH:D009103), FCGR (MESH:C535406), CF (MESH:D003550)
- **Chemicals:** CF (MESH:D002142), t (MESH:D014316), PPIs (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232), SCR_026316 — Muntiacus muntjak (Barking deer), Transformed cell line (CVCL_4349)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12598752/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12598752/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/PMC12598752/full.md

---
Source: https://tomesphere.com/paper/PMC12598752