OneProt: Towards Multi-Modal Protein Foundation Models
Klemens Fl\"oge, Srisruthi Udayakumar, Johanna Sommer, Marie Piraud, Stefan Kesselheim, Vincent Fortuin, Stephan G\"unneman, Karel J van der Weg, Holger Gohlke, Erinc Merdivan, Alina Bazarova

TL;DR
OneProt is a multi-modal protein AI model integrating structural, sequence, text, and binding site data, demonstrating improved retrieval and downstream task performance, and enabling transfer learning across protein representations.
Contribution
It introduces a novel multi-modal protein modeling approach using ImageBind and combines GNNs and transformers, highlighting the importance of binding site data and transfer capabilities.
Findings
Strong performance in retrieval tasks
Enhanced downstream task accuracy
Binding site encoder is crucial for performance
Abstract
Recent advances in Artificial Intelligence have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, text, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of protein modality encoders in a lightweight fine-tuning scheme that focuses on pairwise alignment with sequence data rather than requiring full matches. This novel approach comprises a mix of Graph Neural Networks and transformer architectures. It demonstrates strong performance in retrieval tasks and showcases the efficacy of multi-modal systems in Protein Machine Learning through a broad spectrum of downstream baselines, including enzyme function prediction and binding site analysis. Furthermore, OneProt enables the transfer of representational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks
MethodsALIGN
