TL;DR
This paper introduces DB-KSVD, a scalable dictionary learning algorithm inspired by KSVD, capable of disentangling high-dimensional embeddings in large datasets, and demonstrates its effectiveness on transformer and image models.
Contribution
Proposes DB-KSVD, a scalable adaptation of KSVD for high-dimensional, large-scale dictionary learning, matching SAE performance with a different optimization approach.
Findings
DB-KSVD effectively disentangles text and image embeddings.
It achieves competitive results on SAEBench metrics.
Traditional optimization methods can scale to large, high-dimensional datasets.
Abstract
Dictionary learning has recently emerged as a promising approach for mechanistic interpretability of large transformer models. Disentangling high-dimensional transformer embeddings requires algorithms that scale to high-dimensional data with large sample sizes. Recent work has explored sparse autoencoders (SAEs) for this problem. However, SAEs use a simple linear encoder to solve the sparse encoding subproblem, which is known to be NP-hard. It is therefore interesting to understand whether this approach is sufficient to find good solutions to the dictionary learning problem or if a more sophisticated algorithm could find better solutions. In this work, we propose Double-Batch KSVD (DB-KSVD), a scalable dictionary learning algorithm that adapts the classic KSVD algorithm. DB-KSVD is informed by the rich theoretical foundations of KSVD but scales to datasets with millions of samples and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
