Platform for Representation and Integration of multimodal Molecular Embeddings
Erika Yilin Zheng, Yu Yan, Baradwaj Simha Sankar, Ethan Ji, Steven Swee, Irsyad Adam, Ding Wang, Alexander Russell Pelletier, Alex Bui, Wei Wang, Peipei Ping

TL;DR
This paper introduces PRISME, a framework that integrates diverse molecular embeddings from multiple data sources into a unified representation, improving performance on biomedical tasks.
Contribution
The study presents a novel workflow using an autoencoder to combine heterogeneous molecular embeddings, enabling comprehensive and robust representations across biological contexts.
Findings
PRISME outperforms individual embeddings in benchmark tasks.
Integrated embeddings capture more complete biological signals.
The approach enhances missing value imputation accuracy.
Abstract
Existing machine learning methods for molecular (e.g., gene) embeddings are restricted to specific tasks or data modalities, limiting their effectiveness within narrow domains. As a result, they fail to capture the full breadth of gene functions and interactions across diverse biological contexts. In this study, we have systematically evaluated knowledge representations of biomolecules across multiple dimensions representing a task-agnostic manner spanning three major data sources, including omics experimental data, literature-derived text data, and knowledge graph-based representations. To distinguish between meaningful biological signals from chance correlations, we devised an adjusted variant of Singular Vector Canonical Correlation Analysis (SVCCA) that quantifies signal redundancy and complementarity across different data modalities and sources. These analyses reveal that existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Computational Drug Discovery Methods · Biomedical Text Mining and Ontologies
