Driving Accurate Allergen Prediction with Protein Language Models and Generalization-Focused Evaluation
Brian Shing-Hei Wong, Joshua Mincheol Kim, Sin-Hang Fung, Qing Xiong, Kelvin Fu-Kiu Ao, Junkang Wei, Ran Wang, Dan Michelle Wang, Jingying Zhou, Bo Feng, Alfred Sze-Lok Cheng, Kevin Y. Yip, Stephen Kwok-Wing Tsui, Qin Cao

TL;DR
This paper introduces Applm, a new allergen prediction framework using large protein language models, which outperforms existing methods in challenging real-world scenarios and emphasizes generalization and mutation analysis.
Contribution
The paper presents Applm, a novel allergen prediction method leveraging a 100-billion parameter protein language model, with a focus on challenging real-world tasks and generalization.
Findings
Applm outperforms seven state-of-the-art methods across diverse tasks.
xTrimoPGLM captures general protein features crucial for allergen prediction.
The framework effectively identifies novel allergens and assesses mutation impacts.
Abstract
Allergens, typically proteins capable of triggering adverse immune responses, represent a significant public health challenge. To accurately identify allergen proteins, we introduce Applm (Allergen Prediction with Protein Language Models), a computational framework that leverages the 100-billion parameter xTrimoPGLM protein language model. We show that Applm consistently outperforms seven state-of-the-art methods in a diverse set of tasks that closely resemble difficult real-world scenarios. These include identifying novel allergens that lack similar examples in the training set, differentiating between allergens and non-allergens among homologs with high sequence similarity, and assessing functional consequences of mutations that create few changes to the protein sequences. Our analysis confirms that xTrimoPGLM, originally trained on one trillion tokens to capture general protein…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
