DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics
Yucheng Liao, Han Wen, Weinan E, Weijie Zhang

TL;DR
DIA-CLIP is a universal, pre-trained model for DIA proteomics that enables high-precision, zero-shot peptide-spectrum matching, outperforming existing methods and enhancing protein identification across diverse conditions.
Contribution
The paper introduces DIA-CLIP, a novel cross-modal representation learning framework that shifts DIA analysis from semi-supervised training to a universal pre-trained model for improved generalization.
Findings
Up to 45% increase in protein identification.
12% reduction in entrapment identifications.
Outperforms state-of-the-art tools across benchmarks.
Abstract
Data-independent acquisition mass spectrometry (DIA-MS) has established itself as a cornerstone of proteomic profiling and large-scale systems biology, offering unparalleled depth and reproducibility. Current DIA analysis frameworks, however, require semi-supervised training within each run for peptide-spectrum match (PSM) re-scoring. This approach is prone to overfitting and lacks generalizability across diverse species and experimental conditions. Here, we present DIA-CLIP, a pre-trained model shifting the DIA analysis paradigm from semi-supervised training to universal cross-modal representation learning. By integrating dual-encoder contrastive learning framework with encoder-decoder architecture, DIA-CLIP establishes a unified cross-modal representation for peptides and corresponding spectral features, achieving high-precision, zero-shot PSM inference. Extensive evaluations across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Mass Spectrometry Techniques and Applications · Machine Learning in Bioinformatics
