Classifying the Stoichiometry of Virus-like Particles with Interpretable Machine Learning
Jiayang Zhang, Xianyuan Liu, Wei Wu, Sina Tabakhi, Wenrui Fan, Shuo, Zhou, Kang Lan Tee, Tuck Seng Wong, Haiping Lu

TL;DR
This paper presents an interpretable machine learning pipeline for classifying virus-like particle stoichiometry, enabling efficient analysis and feature identification to aid vaccine development.
Contribution
It introduces a new dataset and a linear model-based approach for classifying VLP stoichiometry with interpretability and feature analysis.
Findings
The pipeline accurately classifies VLP stoichiometry.
Feature encoding impacts model performance and interpretability.
Key protein sequence features influencing assembly are identified.
Abstract
Virus-like particles (VLPs) are valuable for vaccine development due to their immune-triggering properties. Understanding their stoichiometry, the number of protein subunits to form a VLP, is critical for vaccine optimisation. However, current experimental methods to determine stoichiometry are time-consuming and require highly purified proteins. To efficiently classify stoichiometry classes in proteins, we curate a new dataset and propose an interpretable, data-driven pipeline leveraging linear machine learning models. We also explore the impact of feature encoding on model performance and interpretability, as well as methods to identify key protein sequence features influencing classification. The evaluation of our pipeline demonstrates that it can classify stoichiometry while revealing protein features that possibly influence VLP assembly. The data and code used in this work are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacteriophages and microbial interactions · Machine Learning in Bioinformatics · Genomics and Phylogenetic Studies
