Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and   Modalities in Biomedicine

Konstantin Hemker; Nikola Simidjievski; Mateja Jamnik

arXiv:2405.19950·cs.LG·April 17, 2025·1 cites

Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

Konstantin Hemker, Nikola Simidjievski, Mateja Jamnik

PDF

Open Access 1 Repo

TL;DR

Multimodal Lego (MM-Lego) is a versatile framework that enables merging and fine-tuning of diverse unimodal encoders into effective multimodal models without extensive retraining, especially useful in biomedical applications.

Contribution

Introduces MM-Lego, a universal fusion framework that converts unimodal encoders into multimodal models with minimal fine-tuning, overcoming limitations of existing methods.

Findings

01

Achieves competitive performance without end-to-end training.

02

Operates on any unimodal encoder.

03

Surpasses benchmarks in five out of seven datasets.

Abstract

Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. Thus, the demand for multimodal machine learning models has sharply risen for modalities that go beyond vision and language, such as sequences, graphs, time series, or tabular data. While there are many available multimodal fusion and alignment approaches, most of them require end-to-end training, scale quadratically with the number of modalities, cannot handle cases of high modality imbalance in the training set, or are highly topology-specific, making them too restrictive for many biomedical learning tasks. This paper presents Multimodal Lego (MM-Lego), a general-purpose fusion framework to turn any set of encoders into a competitive multimodal model with no or minimal fine-tuning. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

konst-int-i/mm-lego
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsSparse Evolutionary Training