Multimodal Representation Learning using Adaptive Graph Construction

Weichen Huang

arXiv:2410.06395·cs.LG·October 10, 2024

Multimodal Representation Learning using Adaptive Graph Construction

Weichen Huang

PDF

Open Access

TL;DR

AutoBIND is a new contrastive learning framework that adaptively constructs graphs to learn from multiple modalities, demonstrating superior performance in Alzheimer's disease detection with diverse data types.

Contribution

It introduces AutoBIND, a novel method capable of handling an arbitrary number of modalities through graph optimization, unlike previous fixed-architecture approaches.

Findings

01

AutoBIND outperforms previous methods on Alzheimer's disease detection.

02

The framework effectively integrates multiple data modalities.

03

Demonstrates generalizability to real-world medical data.

Abstract

Multimodal contrastive learning train neural networks by levergaing data from heterogeneous sources such as images and text. Yet, many current multimodal learning architectures cannot generalize to an arbitrary number of modalities and need to be hand-constructed. We propose AutoBIND, a novel contrastive learning framework that can learn representations from an arbitrary number of modalites through graph optimization. We evaluate AutoBIND on Alzhiemer's disease detection because it has real-world medical applicability and it contains a broad range of data modalities. We show that AutoBIND outperforms previous methods on this task, highlighting the generalizablility of the approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies

MethodsContrastive Learning