Multi-modal Transfer Learning between Biological Foundation Models
Juan Jose Garau-Luis, Patrick Bordes, Liam Gonzalez, Masa Roller,, Bernardo P. de Almeida, Lorenz Hexemer, Christopher Blum, Stefan Laurent, Jan, Grzegorzewski, Maren Lang, Thomas Pierrot, Guillaume Richard

TL;DR
This paper introduces IsoFormer, a multi-modal transfer learning model that integrates DNA, RNA, and protein data to improve predictions of gene expression and transcript isoform diversity, advancing computational genomics.
Contribution
It presents a novel multi-modal model connecting different biological sequence modalities using pre-trained encoders, enabling improved gene expression predictions.
Findings
Outperforms existing methods in predicting differential transcript expression
Effectively leverages multiple modalities for biological sequence modeling
Enables efficient transfer learning across modalities
Abstract
Biological sequences encode fundamental instructions for the building blocks of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence modality (DNA, RNA, or protein). Key problems in genomics intrinsically involve multiple modalities, but it remains unclear how to adapt general-purpose sequence models to those cases. In this work we propose a multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders. We demonstrate its capabilities by applying it to the largely unsolved problem of predicting how multiple RNA transcript isoforms originate from the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning in Bioinformatics · Biomedical Text Mining and Ontologies · Machine Learning and Data Classification
