Sparse Correspondence Analysis for Contingency Tables

Ruiping Liu; Ndeye Niang; Gilbert Saporta; Huiwen Wang

arXiv:2012.04271·stat.ME·December 9, 2020·1 cites

Sparse Correspondence Analysis for Contingency Tables

Ruiping Liu, Ndeye Niang, Gilbert Saporta, Huiwen Wang

PDF

Open Access

TL;DR

This paper introduces sparse variants of correspondence analysis for large contingency tables, enabling feature selection and interpretability in text mining applications through a novel sparsification technique.

Contribution

It proposes sparse correspondence analysis methods with a tuning mechanism for sparsity levels, extending s-PCA concepts to contingency tables with a new deflation technique.

Findings

01

Enables sparsity in rows and columns of contingency tables

02

Provides a tuning method for sparsity levels

03

Improves interpretability of large text data matrices

Abstract

Since the introduction of the lasso in regression, various sparse methods have been developed in an unsupervised context like sparse principal component analysis (s-PCA), sparse canonical correlation analysis (s-CCA) and sparse singular value decomposition (s-SVD). These sparse methods combine feature selection and dimension reduction. One advantage of s-PCA is to simplify the interpretation of the (pseudo) principal components since each one is expressed as a linear combination of a small number of variables. The disadvantages lie on the one hand in the difficulty of choosing the number of non-zero coefficients in the absence of a well established criterion and on the other hand in the loss of orthogonality for the components and/or the loadings. In this paper we propose sparse variants of correspondence analysis (CA)for large contingency tables like documents-terms matrices used in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSensory Analysis and Statistical Methods · Spectroscopy and Chemometric Analyses · Text and Document Classification Technologies