Deep Scattering Spectrum

Joakim And\'en; St\'ephane Mallat

arXiv:1304.6763·cs.SD·June 15, 2015

Deep Scattering Spectrum

Joakim And\'en, St\'ephane Mallat

PDF

1 Repo

TL;DR

The paper introduces a deep scattering spectrum that extends traditional audio representations with wavelet-based transforms, achieving state-of-the-art results in music genre and phoneme classification tasks.

Contribution

It proposes a novel scattering transform framework that captures translation and deformation invariances, improving audio classification performance.

Findings

01

Achieved state-of-the-art accuracy on GTZAN genre classification.

02

Attained top results on TIMIT phoneme classification.

03

Demonstrated robustness to time-warping and frequency transposition.

Abstract

A scattering transform defines a locally translation invariant representation which is stable to time-warping deformations. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation. A frequency transposition invariant representation is obtained by applying a scattering transform along log-frequency. State-the-of-art classification results are obtained for musical genre and phone classification on GTZAN and TIMIT databases, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/tdfbanks
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.