Generalized Canonical Correlation Analysis for Disparate Data Fusion

Ming Sun; Carey E. Priebe; Minh Tang

arXiv:1209.3761·stat.ML·September 18, 2012

Generalized Canonical Correlation Analysis for Disparate Data Fusion

Ming Sun, Carey E. Priebe, Minh Tang

PDF

Open Access

TL;DR

This paper explores the efficiency of Generalized Canonical Correlation Analysis (GCCA) for data fusion across disparate data sources, with a focus on text classification tasks.

Contribution

It provides an analysis of GCCA's efficiency in manifold matching and data fusion, extending CCA within the RRR framework for improved joint inference.

Findings

01

GCCA effectively fuses multiple data sources for classification.

02

Efficiency varies under different training conditions.

03

GCCA outperforms traditional CCA in certain scenarios.

Abstract

Manifold matching works to identify embeddings of multiple disparate data spaces into the same low-dimensional space, where joint inference can be pursued. It is an enabling methodology for fusion and inference from multiple and massive disparate data sources. In this paper we focus on a method called Canonical Correlation Analysis (CCA) and its generalization Generalized Canonical Correlation Analysis (GCCA), which belong to the more general Reduced Rank Regression (RRR) framework. We present an efficiency investigation of CCA and GCCA under different training conditions for a particular text document classification task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Image Retrieval and Classification Techniques · Bayesian Methods and Mixture Models