CLARGA: Multimodal Graph Representation Learning over Arbitrary Sets of Modalities

Santosh Patapati

arXiv:2512.11901·cs.CV·December 16, 2025

CLARGA: Multimodal Graph Representation Learning over Arbitrary Sets of Modalities

Santosh Patapati

PDF

Open Access

TL;DR

CLARGA is a versatile multimodal fusion architecture that constructs sample-specific graphs for efficient, adaptive, and robust multimodal representation learning across diverse tasks and datasets.

Contribution

It introduces a general-purpose, graph-based multimodal fusion framework that adapts to any number and type of modalities without modifications.

Findings

01

Outperforms baselines and state-of-the-art models on 7 diverse datasets.

02

Demonstrates robustness to missing modality inputs.

03

Efficiently scales with the number of modalities due to sub-quadratic complexity.

Abstract

We introduce CLARGA, a general-purpose multimodal fusion architecture for multimodal representation learning that works with any number and type of modalities without changing the underlying framework. Given a supervised dataset, CLARGA can be applied to virtually any machine learning task to fuse different multimodal representations for processing by downstream layers. On a sample-by-sample basis, CLARGA learns how modalities should inform one another by building an attention weighted graph over their features and passing messages along this graph with a multi-head Graph Attention Network. Not only does this make CLARGA highly adaptive, as it constructs unique graphs for different samples, it makes for efficient fusion with sub-quadratic complexity as the number of modalities grows. Through a learnable mask, it can also adapt to missing modality inputs. The model is trained with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Emotion and Mood Recognition