Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via   Generative Models

Yasin Yilmaz; Mehmet Aktukmak; Alfred O. Hero

arXiv:2108.12445·cs.LG·October 4, 2021

Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via Generative Models

Yasin Yilmaz, Mehmet Aktukmak, Alfred O. Hero

PDF

1 Repo

TL;DR

This paper introduces a Bayesian generative modeling framework for high-dimensional heterogeneous data, enabling effective multimodal data fusion and analysis in tasks like anomaly detection and recommendation.

Contribution

It develops a scalable Bayesian approach that combines diverse data types via exponential family distributions, extending latent space embedding to heterogeneous datasets.

Findings

01

The method scales to millions of instances and thousands of features.

02

It achieves competitive performance on anomaly detection, data imputation, and recommendation tasks.

03

Experiments on NYC Taxi and MovieLens datasets validate effectiveness.

Abstract

The commonly used latent space embedding techniques, such as Principal Component Analysis, Factor Analysis, and manifold learning techniques, are typically used for learning effective representations of homogeneous data. However, they do not readily extend to heterogeneous data that are a combination of numerical and categorical variables, e.g., arising from linked GPS and text data. In this paper, we are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion. The learned generative model provides latent unified representations that capture the factors common to the multiple dimensions of the data, and thus enable fusing multimodal data for various machine learning tasks. Following a Bayesian approach, we propose a general framework that combines disparate data types through the natural parameterization of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maktukmak/mmfa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGreedy Policy Search