In-Database Regression in Input Sparsity Time

Rajesh Jayaram; Alireza Samadian; David P. Woodruff; Peng Ye

arXiv:2107.05672·cs.DS·July 14, 2021·1 cites

In-Database Regression in Input Sparsity Time

Rajesh Jayaram, Alireza Samadian, David P. Woodruff, Peng Ye

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces fast, input-sparsity algorithms for computing subspace embeddings directly from database join results, enabling efficient high-accuracy regression without explicitly constructing large matrices.

Contribution

It presents novel input-sparsity algorithms for subspace embeddings tailored for join outputs, significantly reducing computational time for regression tasks.

Findings

01

Algorithms run in time proportional to non-zero entries in input tables

02

Achieves high-accuracy regression faster than prior FAQ-based methods

03

Empirical results show substantial speedups on real datasets

Abstract

Sketching is a powerful dimensionality reduction technique for accelerating algorithms for data analysis. A crucial step in sketching methods is to compute a subspace embedding (SE) for a large matrix $A \in R^{N \times d}$ . SE's are the primary tool for obtaining extremely efficient solutions for many linear-algebraic tasks, such as least squares regression and low rank approximation. Computing an SE often requires an explicit representation of $A$ and running time proportional to the size of $A$ . However, if $A = T_{1} ⋈ T_{2} ⋈ \dots ⋈ T_{m}$ is the result of a database join query on several smaller tables $T_{i} \in R^{n_{i} \times d_{i}}$ , then this running time can be prohibitive, as $A$ itself can have as many as $O (n_{1} n_{2} \dots n_{m})$ rows. In this work, we design subspace…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AnonymousFireman/ICML_code
noneOfficial

Videos

In-Database Regression in Input Sparsity Time· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms