On b-bit min-wise hashing for large-scale regression and classification with sparse data
Rajen D. Shah, Nicolai Meinshausen

TL;DR
This paper investigates the use of b-bit min-wise hashing for dimension reduction in large-scale sparse regression and classification, providing theoretical bounds on prediction error and demonstrating its effectiveness for various models.
Contribution
The work derives prediction error bounds for b-bit min-wise hashing in large-scale sparse settings and shows its applicability to linear, logistic, and interaction models.
Findings
Prediction error vanishes asymptotically under certain sparsity conditions.
Ordinary least squares and ridge regression can be effectively applied to reduced data.
Non-asymptotic bounds are provided for complex models with interactions.
Abstract
Large-scale regression problems where both the number of variables, , and the number of observations, , may be large and in the order of millions or more, are becoming increasingly more common. Typically the data are sparse: only a fraction of a percent of the entries in the design matrix are non-zero. Nevertheless, often the only computationally feasible approach is to perform dimension reduction to obtain a new design matrix with far fewer columns and then work with this compressed data. -bit min-wise hashing (Li and Konig, 2011) is a promising dimension reduction scheme for sparse matrices which produces a set of random features such that regression on the resulting design matrix approximates a kernel regression with the resemblance kernel. In this work, we derive bounds on the prediction error of such regressions. For both linear and logistic models we show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Advanced Image and Video Retrieval Techniques
