Matrix Product Sketching via Coordinated Sampling

Majid Daliri; Juliana Freire; Danrong Li; Christopher Musco

arXiv:2501.17836·cs.DS·January 30, 2025

Matrix Product Sketching via Coordinated Sampling

Majid Daliri, Juliana Freire, Danrong Li, Christopher Musco

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces a coordinated sampling method for matrix product approximation that outperforms classical linear sketching techniques in sparse settings, with practical benefits demonstrated in distributed regression and language models.

Contribution

The paper presents a novel coordinated sampling approach for matrix product sketching that improves efficiency over traditional methods in sparse data scenarios.

Findings

01

Coordinated sampling reduces sketch size for Frobenius norm error in sparse matrices.

02

Empirical results show an order of magnitude improvement in real applications.

03

Method outperforms classical linear sketching in distributed regression and language models.

Abstract

We revisit the well-studied problem of approximating a matrix product, $A^{T} B$ , based on small space sketches $S (A)$ and $S (B)$ of $A \in R^{n \times d}$ and $B \in R^{n \times m}$ . We are interested in the setting where the sketches must be computed independently of each other, except for the use of a shared random seed. We prove that, when $A$ and $B$ are sparse, methods based on \emph{coordinated random sampling} can outperform classical linear sketching approaches, like Johnson-Lindenstrauss Projection or CountSketch. For example, to obtain Frobenius norm error $ϵ ∥ A ∥_{F} ∥ B ∥_{F}$ , coordinated sampling requires sketches of size $O (s / ϵ^{2})$ when $A$ and $B$ have at most $s \leq d, m$ non-zeros per row. In contrast, linear sketching leads to…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. I found the presentation of this paper to be very clear and engaging. The problem setting, requirements, and notation are all well-defined, and the core idea is explained thoroughly, making it easy for readers to follow. The proof is presented in a clean and structured manner, enhancing readability. 2. The paper provides strong motivation for studying the problem of computing independent sketches and discusses several potential applications, demonstrating their proposed algorithm in a one im

Weaknesses

1. One concern I have regarding the experiments is that while vector quantization—a nonlinear compression technique—has been widely studied and applied in practice for approximating the computation of the key matrix in the attention layer, it remains unclear whether using linear compression techniques, such as approximate matrix products, to approximate $QK^T$ or just the key matrix $K$ could degrade model performance significantly in downstream applications. I suggest that the authors cite work

Reviewer 02Rating 6Confidence 3

Strengths

S1: Interesting problem. S2: Elegant solutions. S3: Solid experiments.

Weaknesses

W1: The result and the approach are not very surprising, given the prior work of Bessa et al and Daliri et al W2: The analysis of one of the algorithms (Threshold Sampling) seems fairly straightforward.

Reviewer 03Rating 6Confidence 4

Strengths

- The theoretical analysis of this paper is solid. The paper gives a new sketching algorithm with size $O(s^2 /\epsilon^2)$. This bound will be better for sparse matrix compared to the previous methods, which is interesting to me. - The paper gives a detailed experiment that demonstrates the advantage of the proposed algorithms. - The presentation of the paper is good. The paper has a nice introduction section.

Weaknesses

- I still do not understand the motivation of the new model the paper discusses well (see the questions below). Maybe the authors can give more explanation about this？ - It will be better if the experiments can also give a comparison to the previous sampling-based method.

Videos

Matrix Product Sketching via Coordinated Sampling· slideslive

Taxonomy

Topics3D Shape Modeling and Analysis · Human Motion and Animation · Manufacturing Process and Optimization

MethodsSoftmax · Attention Is All You Need · Linear Regression