LANISTR: Multimodal Learning from Structured and Unstructured Data

Sayna Ebrahimi; Sercan O. Arik; Yihe Dong; Tomas Pfister

arXiv:2305.16556·cs.LG·April 25, 2024·2 cites

LANISTR: Multimodal Learning from Structured and Unstructured Data

Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister

PDF

Open Access 1 Repo

TL;DR

LANISTR is a novel attention-based framework that effectively learns from multimodal data including language, images, and structured data, demonstrating significant improvements in real-world tasks with missing modalities.

Contribution

It introduces a masking-based training method and a similarity-based loss for cross-modal learning from large-scale multimodal data with missing modalities.

Findings

01

Achieves 6.6% AUROC improvement on healthcare data

02

Achieves 14% accuracy improvement on retail data

03

Robust to high ratios of missing modality samples

Abstract

Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datasets, MIMIC-IV (from healthcare) and Amazon Product Review (from retail), LANISTR demonstrates remarkable improvements, 6.6\% (in AUROC) and 14\% (in accuracy)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/lanistr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing

MethodsTest