Identifiability of Complete Dictionary Learning
J\'er\'emy E. Cohen, Nicolas Gillis

TL;DR
This paper establishes new deterministic bounds for the sample complexity needed for unique recovery in complete dictionary learning, especially when data has low-rank structure, improving over previous combinatorial bounds.
Contribution
It provides the first non-asymptotic sample bounds for identifiability in deterministic low-rank dictionary learning scenarios, reducing sample complexity from combinatorial to polynomial in r.
Findings
Sample complexity bound of O(r^3/(r-k)^2) for identifiability.
Necessary lower bounds that challenge previous assumptions.
Constant proportion of zeros in B requires only O(r) samples for recovery.
Abstract
Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix and an integer , find a dictionary with columns and a matrix with -sparse columns (that is, each column of has at most non-zero entries) such that . A key issue in SCA is identifiability, that is, characterizing the conditions under which and are essentially unique (that is, they are unique up to permutation and scaling of the columns of and rows of ). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of samples (that is, columns of ) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
