Generalizing Correspondence Analysis for Applications in Machine   Learning

Hsiang Hsu; Salman Salamatian; Flavio P. Calmon

arXiv:1806.08449·cs.LG·July 1, 2020

Generalizing Correspondence Analysis for Applications in Machine Learning

Hsiang Hsu, Salman Salamatian, Flavio P. Calmon

PDF

TL;DR

This paper introduces a scalable, neural network-based approach to perform correspondence analysis (CA) for high-dimensional data, enabling visualization and interpretation of data dependencies in large datasets.

Contribution

It provides a novel information-theoretic interpretation of CA via principal inertia components and develops algorithms to estimate them using deep neural networks for large-scale applications.

Findings

01

Neural network algorithms reliably approximate principal inertia components.

02

CA embeddings help visualize classification boundaries and training dynamics.

03

The approach scales to high-dimensional, large datasets.

Abstract

Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies by finding maximally correlated embeddings of pairs of random variables. CA has found applications in fields ranging from epidemiology to social sciences; however, current methods do not scale to large, high-dimensional datasets. In this paper, we provide a novel interpretation of CA in terms of an information-theoretic quantity called the principal inertia components. We show that estimating the principal inertia components, which consists in solving a functional optimization problem over the space of finite variance functions of two random variable, is equivalent to performing CA. We then leverage this insight to design novel algorithms to perform CA at an unprecedented scale. Particularly, we demonstrate how the principal inertia components can be reliably approximated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.