TL;DR
This paper introduces a Bayesian generative modeling framework for high-dimensional heterogeneous data, enabling effective multimodal data fusion and analysis in tasks like anomaly detection and recommendation.
Contribution
It develops a scalable Bayesian approach that combines diverse data types via exponential family distributions, extending latent space embedding to heterogeneous datasets.
Findings
The method scales to millions of instances and thousands of features.
It achieves competitive performance on anomaly detection, data imputation, and recommendation tasks.
Experiments on NYC Taxi and MovieLens datasets validate effectiveness.
Abstract
The commonly used latent space embedding techniques, such as Principal Component Analysis, Factor Analysis, and manifold learning techniques, are typically used for learning effective representations of homogeneous data. However, they do not readily extend to heterogeneous data that are a combination of numerical and categorical variables, e.g., arising from linked GPS and text data. In this paper, we are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion. The learned generative model provides latent unified representations that capture the factors common to the multiple dimensions of the data, and thus enable fusing multimodal data for various machine learning tasks. Following a Bayesian approach, we propose a general framework that combines disparate data types through the natural parameterization of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGreedy Policy Search
