Shift-Invariance Sparse Coding for Audio Classification

Roger Grosse; Rajat Raina; Helen Kwong; Andrew Y. Ng

arXiv:1206.5241·cs.LG·June 26, 2012·92 cites

Shift-Invariance Sparse Coding for Audio Classification

Roger Grosse, Rajat Raina, Helen Kwong, Andrew Y. Ng

PDF

Open Access

TL;DR

This paper introduces an efficient algorithm for shift-invariant sparse coding (SISC) that learns basis functions capturing shifted features in time-series data, improving audio classification performance over traditional spectral features.

Contribution

The paper presents a novel, efficient method for learning SISC bases by solving convex optimization problems in the Fourier domain, enabling better feature extraction for audio classification.

Findings

01

SISC learned features outperform spectral and cepstral features in speech and music classification.

02

The proposed algorithm efficiently computes exact solutions for large-scale sparse coding problems.

03

SISC captures shift-invariant features, enhancing classification accuracy in audio domains.

Abstract

Sparse coding is an unsupervised learning algorithm that learns a succinct high-level representation of the inputs given only unlabeled data; it represents each input as a sparse linear combination of a set of basis functions. Originally applied to modeling the human visual cortex, sparse coding has also been shown to be useful for self-taught learning, in which the goal is to solve a supervised classification task given access to additional unlabeled data drawn from different classes than that in the supervised learning problem. Shift-invariant sparse coding (SISC) is an extension of sparse coding which reconstructs a (usually time-series) input using all of the basis functions in all possible shifts. In this paper, we present an efficient algorithm for learning SISC bases. Our method is based on iteratively solving two large convex optimization problems: The first, which computes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Neuroscience and Music Perception · Music and Audio Processing