Curriculum-guided multimodal representation learning enables generalizable prediction of nanomaterial-protein interactions
Hengjie Yu, Kenneth A. Dawson, Haiyun Yang, Shuya Liu, Yan Yan, Yaochu Jin

TL;DR
This paper introduces CuMMI, a curriculum-guided multimodal model that predicts nanomaterial-protein interactions with high accuracy and robustness across unseen data, leveraging a large dataset and multi-stage training.
Contribution
The study presents a novel, generalizable framework combining multimodal data and curriculum learning to improve nanomaterial-protein interaction prediction.
Findings
CuMMI achieves mean classification metrics over 0.75 across multiple validation tests.
The model effectively generalizes to unseen nanomaterials and proteins.
Fine-tuning with limited data enhances performance significantly.
Abstract
Nanomaterial-protein interactions (NPI) are pivotal to realizing the therapeutic and diagnostic potential of nanomaterials. Although AI promises to accelerate mechanistic understanding and enable rational nanomaterial design, robust generalization to unseen nanomaterials or proteins remains unresolved. Here, we present CuMMI (curriculum-guided multimodal interaction model), a generalizable, explainable, and transferable model designed to infer NPI across complex biological settings. CuMMI leverages a self-constructed million-scale NPI dataset and adopts a multi-stage curriculum centered on human plasma, with progressively broader biofluid exposure to enhance data coverage and generalizability. By integrating protein sequence, structure, and a text-encoded experimental context of 37 features, CuMMI captures complementary material-specific, biochemical, and environmental information.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
