Shift-Invariance Sparse Coding for Audio Classification
Roger Grosse, Rajat Raina, Helen Kwong, Andrew Y. Ng

TL;DR
This paper introduces an efficient algorithm for shift-invariant sparse coding (SISC) that learns basis functions capturing shifted features in time-series data, improving audio classification performance over traditional spectral features.
Contribution
The paper presents a novel, efficient method for learning SISC bases by solving convex optimization problems in the Fourier domain, enabling better feature extraction for audio classification.
Findings
SISC learned features outperform spectral and cepstral features in speech and music classification.
The proposed algorithm efficiently computes exact solutions for large-scale sparse coding problems.
SISC captures shift-invariant features, enhancing classification accuracy in audio domains.
Abstract
Sparse coding is an unsupervised learning algorithm that learns a succinct high-level representation of the inputs given only unlabeled data; it represents each input as a sparse linear combination of a set of basis functions. Originally applied to modeling the human visual cortex, sparse coding has also been shown to be useful for self-taught learning, in which the goal is to solve a supervised classification task given access to additional unlabeled data drawn from different classes than that in the supervised learning problem. Shift-invariant sparse coding (SISC) is an extension of sparse coding which reconstructs a (usually time-series) input using all of the basis functions in all possible shifts. In this paper, we present an efficient algorithm for learning SISC bases. Our method is based on iteratively solving two large convex optimization problems: The first, which computes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Neuroscience and Music Perception · Music and Audio Processing
