When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

Stanislav Budzinskiy

arXiv:2407.03250·math.NA·September 9, 2025

When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

Stanislav Budzinskiy

PDF

1 Repo

TL;DR

This paper clarifies misconceptions about low-rank approximations of function-generated matrices, showing that certain classes can be approximated accurately with low rank independent of data dimension, with implications for big data and neural networks.

Contribution

The paper provides a theoretical explanation for when function-generated matrices can be approximated with low rank independent of the ambient dimension, extending to tensor-train formats.

Findings

01

Low-rank approximation is possible for specific function classes.

02

Rank depends logarithmically on matrix size, not dimension.

03

Results apply to big data matrices and neural network attention mechanisms.

Abstract

The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$ -dimensional variables. We identify several misconceptions surrounding a claim that, for a specific class of analytic functions, such $n \times n$ matrices admit accurate entrywise approximation of rank that is independent of $m$ and grows as $lo g (n)$ -- colloquially known as ''big-data matrices are approximately low-rank''. We provide a theoretical explanation of the numerical results presented in support of this claim, describing three narrower classes of functions for which function-generated matrices can be approximated within an entrywise error of order $ε$ with rank $O (lo g (n) ε^{- 2} lo g (ε^{- 1}))$ that is independent of the dimension $m$ : (i) functions of the inner product of the two variables, (ii) functions of the Euclidean distance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sbudzinskiy/low-rank-big-data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need