In-Database Regression in Input Sparsity Time
Rajesh Jayaram, Alireza Samadian, David P. Woodruff, Peng Ye

TL;DR
This paper introduces fast, input-sparsity algorithms for computing subspace embeddings directly from database join results, enabling efficient high-accuracy regression without explicitly constructing large matrices.
Contribution
It presents novel input-sparsity algorithms for subspace embeddings tailored for join outputs, significantly reducing computational time for regression tasks.
Findings
Algorithms run in time proportional to non-zero entries in input tables
Achieves high-accuracy regression faster than prior FAQ-based methods
Empirical results show substantial speedups on real datasets
Abstract
Sketching is a powerful dimensionality reduction technique for accelerating algorithms for data analysis. A crucial step in sketching methods is to compute a subspace embedding (SE) for a large matrix . SE's are the primary tool for obtaining extremely efficient solutions for many linear-algebraic tasks, such as least squares regression and low rank approximation. Computing an SE often requires an explicit representation of and running time proportional to the size of . However, if is the result of a database join query on several smaller tables , then this running time can be prohibitive, as itself can have as many as rows. In this work, we design subspace…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms
