Scalable Partial Least Squares Regression on Grammar-Compressed Data   Matrices

Yasuo Tabei; Hiroto Saigo; Yoshihiro Yamanishi; Simon J. Puglisi

arXiv:1606.05031·cs.DS·June 17, 2016·2 cites

Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices

Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, Simon J. Puglisi

PDF

Open Access

TL;DR

This paper introduces a scalable, interpretable partial least squares regression algorithm called cPLS, which efficiently handles massive high-dimensional data by using grammar compression, significantly reducing computational costs while maintaining high accuracy.

Contribution

The paper presents a novel grammar-compressed data representation and a scalable cPLS algorithm that improves efficiency and interpretability for large high-dimensional datasets.

Findings

01

cPLS outperforms existing methods in prediction accuracy

02

cPLS significantly reduces computational time and memory usage

03

cPLS maintains high interpretability in models learned from massive data

Abstract

With massive high-dimensional data now commonplace in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. Key to turning these data into knowledge is the ability to learn statistical models with high interpretability. Current methods for learning statistical models either produce models that are not interpretable or have prohibitive computational costs when applied to massive data. In this paper we address this need by presenting a scalable algorithm for partial least squares regression (PLS), which we call compression-based PLS (cPLS), to learn predictive linear models with a high interpretability from massive high-dimensional data. We propose a novel grammar-compressed representation of data matrices that supports fast row and column access while the data matrix is in a compressed form. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Neural Networks and Applications · Blind Source Separation Techniques