Multimodal Prediction based on Graph Representations
Icaro Cavalcante Dourado, Salvatore Tabbone, Ricardo da Silva Torres

TL;DR
This paper introduces a graph-based learning model for multimodal prediction tasks that effectively captures relationships between different data modalities, improving accuracy over existing fusion methods.
Contribution
The paper presents a novel rank-fusion graph approach that encodes multiple descriptors into a graph and projects it into a vector space for improved multimodal prediction.
Findings
Outperforms early and late fusion methods in various datasets
Effective across visual, textual, and multimodal features
Demonstrates superior accuracy compared to state-of-the-art techniques
Abstract
This paper proposes a learning model, based on rank-fusion graphs, for general applicability in multimodal prediction tasks, such as multimodal regression and image classification. Rank-fusion graphs encode information from multiple descriptors and retrieval models, thus being able to capture underlying relationships between modalities, samples, and the collection itself. The solution is based on the encoding of multiple ranks for a query (or test sample), defined according to different criteria, into a graph. Later, we project the generated graph into an induced vector space, creating fusion vectors, targeting broader generality and efficiency. A fusion vector estimator is then built to infer whether a multimodal input object refers to a class or not. Our method is capable of promoting a fusion model better than early-fusion and late-fusion alternatives. Performed experiments in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Advanced Text Analysis Techniques
MethodsTest
