Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, Simon J. Puglisi

TL;DR
This paper introduces a scalable, interpretable partial least squares regression algorithm called cPLS, which efficiently handles massive high-dimensional data by using grammar compression, significantly reducing computational costs while maintaining high accuracy.
Contribution
The paper presents a novel grammar-compressed data representation and a scalable cPLS algorithm that improves efficiency and interpretability for large high-dimensional datasets.
Findings
cPLS outperforms existing methods in prediction accuracy
cPLS significantly reduces computational time and memory usage
cPLS maintains high interpretability in models learned from massive data
Abstract
With massive high-dimensional data now commonplace in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. Key to turning these data into knowledge is the ability to learn statistical models with high interpretability. Current methods for learning statistical models either produce models that are not interpretable or have prohibitive computational costs when applied to massive data. In this paper we address this need by presenting a scalable algorithm for partial least squares regression (PLS), which we call compression-based PLS (cPLS), to learn predictive linear models with a high interpretability from massive high-dimensional data. We propose a novel grammar-compressed representation of data matrices that supports fast row and column access while the data matrix is in a compressed form. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Neural Networks and Applications · Blind Source Separation Techniques
